xyzc's blog

Extracting and plotting accuracy of single base and polyT(A)

Posted on 2025-04-18 In Tech
Symbols count in article: 538 Reading time ≈ 1 mins.

Preface

When I trained a new model using bonito, remora or the others, the validation loss and accuracy are great indicators indeed. However, I am wandering what the single base and polyT(A) accuracy are like.

From Gimpel, A.L., Stark, W.J., Heckel, R. et al., it is known that errors are always biased in the entire DNA storage process. In addition, I am curious about the extent of this bias.

A debugging record of network training using REMORA

Posted on 2025-02-20 In Tech
Symbols count in article: 6k Reading time ≈ 5 mins.

I use dorado basecaller first.

1
2

dorado basecaller hac barcode04_0.pod5 --device cuda:0  --emit-moves --emit-sam > barcode04_0_calls.bam
dorado basecaller hac barcode06_0.pod5 --device cuda:0  --emit-moves --emit-sam > barcode06_0_calls.bam

And do remora dataset prepare

remora dataset prepare /data/nas-shared/zhaoxy/sequencing/293T/regrouped_pod5/barcode04/barcode04_0.pod5 /data/nas-shared/zhaoxy/sequencing/293T/regrouped_pod5/barcode04/barcode04_0_calls.bam --output-path can_chunks --refine-kmer-level-table /data/biolab-nvme-pool1/zhaoxy/kmer_models/dna_r10.4.1_e8.2_400bps/9mer_levels_v1.txt --refine-rough-rescale  --motif CG 0  --mod-base-control

And error occurs：

1	[13:53:48.823] Extracted 0 chunks from 960,684 reads.

A current signal grouping method for SQK-NBD114

Posted on 2025-02-19 In Tech
Symbols count in article: 2.3k Reading time ≈ 2 mins.

First get all reads id from fastq files

Here we use barcode 04, 05 and 06 as examples.

fastq_dirs = {
    "barcode04": r'./sequencing/293T/barcode04',
    "barcode05": r'./sequencing/293T/barcode05',
    "barcode06": r'./sequencing/293T/barcode06'
}

pod5_id_data = { "barcode04": set(), "barcode05": set(), "barcode06": set() }

def find_fastq_files(folder_path):
    folder = Path(folder_path)
    return [str(file) for file in folder.rglob('*.fastq')]

for barcode, fastq_dir in fastq_dirs.items():
    read_ids = set()
    for fastq_file in tqdm(find_fastq_files(fastq_dir), desc=f"Processing {barcode}"):
        try:
            for record in SeqIO.parse(fastq_file, 'fastq'):
                read_ids.add(record.id)
        except Exception as e:
            print(f"Error reading {fastq_file}: {e}")
    
    pod5_id_data[barcode] = read_ids
    print(f"Total read ids in {fastq_dir}: {len(read_ids)}")

Then read POD5 files and store them by barcodes

Extract mutations and INDEL error rates from next-generation sequencing results

Posted on 2023-12-28 In Tech
Symbols count in article: 889 Reading time ≈ 1 mins.

Alignment

Ensure there is bwa in the environment. If not, install using this:

1
2
3

git clone https://github.com/lh3/bwa.git
cd bwa
make

or

1	sudo apt install bwa

Microsoft store pages cannot be loaded

Posted on 2023-12-11 In Tech
Symbols count in article: 395 Reading time ≈ 1 mins.

Problem Description

Regardless of whether the proxy is turned on or not, the MS store pages cannot be loaded correctly in Windows 11.