Extracting and plotting accuracy of single base and polyT(A)

Preface

When I trained a new model using bonito, remora or the others, the validation loss and accuracy are great indicators indeed. However, I am wandering what the single base and polyT(A) accuracy are like.

From Gimpel, A.L., Stark, W.J., Heckel, R. et al., it is known that errors are always biased in the entire DNA storage process. In addition, I am curious about the extent of this bias.

Basecalling

Here I use bonito to basecall and ZymoBIOMICS HMW DNA Standard as dataset (which is from PRJEB64570)

1	bonito basecaller dna_r10.4.1_e8.2_400bps_hac@v5.0.0 ./pod5 > ./basecalls/sample_250410_e8.2_400bps_hac.fastq

to be done …