Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-training code for DNABERT-2 on 3-UTR #1

Open
leannmlindsey opened this issue Aug 28, 2024 · 1 comment
Open

Pre-training code for DNABERT-2 on 3-UTR #1

leannmlindsey opened this issue Aug 28, 2024 · 1 comment

Comments

@leannmlindsey
Copy link

Hello! I was wondering if you would release your pretraining code for DNABERT-2 and NT? The DNABERT-2 website does not release the actual code that they used to pre-train, just a suggestion of two similar models to use.

(from DNABERT-2 website)
We used and slightly modified the MosaicBERT implementation for DNABERT-2 https://github.com/mosaicml/examples/tree/main/examples/benchmarks/bert . You should be able to replicate the model training following the instructions.

Or you can use the run_mlm.py at https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling by importing the BertModelForMaskedLM from https://huggingface.co/zhihan1996/DNABERT-2-117M/blob/main/bert_layers.py. It should produce a very similar model.

I am interested in using your implementation of the pre-training DNABERT-2 because you were able to get it to train in such a short time.

Thank you for any help you can provide.
LeAnn

Repository owner deleted a comment Aug 28, 2024
@sergeyvilov
Copy link
Owner

Hello, we are not publishing the training codes at this stage. You may nevertheless find the following info useful:
All training was done with PyTorch. We indeed trained the DNABERT-2 model using the architecture from https://huggingface.co/zhihan1996/DNABERT-2-117M/blob/main/bert_layers.py and the same learning rate scheduler as in the original DNABERT-2 paper. We have recently retrained all models on 10 GPUs (NVIDIA A100 80Gb) with the effective batch size of 4480 for DNABERT-2 and 480 for NTv2-250M-3UTR. As for DNABERT-2, the training time was 1.3 h (which is about 10x less compared to NT) when training on Zoonomia 3'UTR sequences for 2 epochs. This is equivalent to 13 hours when training on a single A100 GPU. DNABERT-2 is also faster than NT when compared on the same batch size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants