Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How many threads and memories required at training stage? #236

Open
yaoxkkkkk opened this issue Oct 20, 2024 · 1 comment
Open

How many threads and memories required at training stage? #236

yaoxkkkkk opened this issue Oct 20, 2024 · 1 comment

Comments

@yaoxkkkkk
Copy link

Thank you for your development. I am using Nanosim to simulate ONT data, I use 32 threads and 256GB memory to run training stage, but it reported out of memory error. The command is

	read_analysis.py genome \
		-i ZJYY_ont_filter.fq.gz \
		-rg nd.asm.fasta \
		-o ${home_dir}/01-data/ONT/${species}_training \
		--fastq \
		-t 32

The ZJYY_ont_filter.fq.gz dataset stat is

file                   format  type   num_seqs         sum_len  min_len   avg_len  max_len
ZJYY_ont_filter.fq.gz  FASTQ   DNA   1,544,988  43,308,647,713    2,000  28,031.7  246,468

And when I run the command without --fastq parameter, the training step could be finished.

@lcoombe
Copy link
Member

lcoombe commented Oct 21, 2024

Hi @yaoxkkkkk,

The amount of memory required will really depend on the dataset that you are training on.
On my end, training using --fastq with the HG002 ONT dataset used for the latest pre-trained models required around 263 GB of RAM - so that could be why you are seeing those errors.
If you want to use --fastq, some other options could be to use our pre-trained model, or try training using a subset of your reads.

Thank you for your interest in NanoSim!
Lauren

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants