-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
General Post-Training with 4 RTX 4090 GPUs #33
Comments
hi @ztianlin the autoregressive finetuning only requires 2 A100/H100 https://github.com/NVIDIA/Cosmos/tree/main/cosmos1/models/autoregressive/nemo/post_training |
Two is still 160GB. Will DIGITS's 128GB be enough? |
Thanks! And what about diffusion models? I really wish that one can post train diffusion models on 4090 |
@ztianlin I'm a PM at NVIDIA for COSMOS. Can you share why post training training on a 4090 is important? |
@jpenningCA @ethanhe42 Currently, the documentation states:
However, this doesn’t provide much clarity. It seems reasonable to infer that 8 A100s or H100s were likely used for the 7B and 14B models, but is that level of hardware strictly necessary from a VRAM perspective? What would be the minimum recommended VRAM requirements? Additionally, the README describes that training uses NeMo Framework's data and model parallelism capabilities, specifically mentioning Fully Sharded Data Parallel (FSDP) and Tensor Parallelism. This suggests that parameters, optimizer states, and activations are distributed across all GPUs, and individual layer parameter tensors are also spread across GPUs. Given this information:
Understanding these details would help determine if alternative hardware configurations could work for post-training these models. |
It's like the question, why we need LLM on iPhone? I think maybe most customers have 4090, instead of expensive A100/H100. |
As metaphorically described by @StarsTesla, my research resources are limited. I believe if one can easily train on 4090, the cosmos community will become larger and more active. |
Hello, I wonder if it is possible to do the general post-training for diffusion WFM with 4 GeForce RTX 4090 GPUs.
My dad can't afford 8 A100 GPUs. Please show mercy to poor people!
The text was updated successfully, but these errors were encountered: