0%

I use dorado basecaller first.

1
2
dorado basecaller hac barcode04_0.pod5 --device cuda:0  --emit-moves --emit-sam > barcode04_0_calls.bam
dorado basecaller hac barcode06_0.pod5 --device cuda:0 --emit-moves --emit-sam > barcode06_0_calls.bam

And do remora dataset prepare

1
remora dataset prepare /data/nas-shared/zhaoxy/sequencing/293T/regrouped_pod5/barcode04/barcode04_0.pod5 /data/nas-shared/zhaoxy/sequencing/293T/regrouped_pod5/barcode04/barcode04_0_calls.bam --output-path can_chunks --refine-kmer-level-table /data/biolab-nvme-pool1/zhaoxy/kmer_models/dna_r10.4.1_e8.2_400bps/9mer_levels_v1.txt --refine-rough-rescale  --motif CG 0  --mod-base-control

And error occurs:

1
[13:53:48.823] Extracted 0 chunks from 960,684 reads.
Read more »

First get all reads id from fastq files

Here we use barcode 04, 05 and 06 as examples.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
fastq_dirs = {
"barcode04": r'./sequencing/293T/barcode04',
"barcode05": r'./sequencing/293T/barcode05',
"barcode06": r'./sequencing/293T/barcode06'
}

pod5_id_data = { "barcode04": set(), "barcode05": set(), "barcode06": set() }

def find_fastq_files(folder_path):
folder = Path(folder_path)
return [str(file) for file in folder.rglob('*.fastq')]

for barcode, fastq_dir in fastq_dirs.items():
read_ids = set()
for fastq_file in tqdm(find_fastq_files(fastq_dir), desc=f"Processing {barcode}"):
try:
for record in SeqIO.parse(fastq_file, 'fastq'):
read_ids.add(record.id)
except Exception as e:
print(f"Error reading {fastq_file}: {e}")

pod5_id_data[barcode] = read_ids
print(f"Total read ids in {fastq_dir}: {len(read_ids)}")

Then read POD5 files and store them by barcodes

Read more »