Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General Post-Training with 4 RTX 4090 GPUs #33

Open
ztianlin opened this issue Jan 10, 2025 · 7 comments
Open

General Post-Training with 4 RTX 4090 GPUs #33

ztianlin opened this issue Jan 10, 2025 · 7 comments
Labels
enhancement New feature or request

Comments

@ztianlin
Copy link

ztianlin commented Jan 10, 2025

Hello, I wonder if it is possible to do the general post-training for diffusion WFM with 4 GeForce RTX 4090 GPUs.
My dad can't afford 8 A100 GPUs. Please show mercy to poor people!

@ztianlin ztianlin changed the title General post-training on server with 4 RTX 4090 gpus General Post-Training on Server with 4 RTX 4090 GPUs Jan 10, 2025
@ztianlin ztianlin changed the title General Post-Training on Server with 4 RTX 4090 GPUs General Post-Training with 4 RTX 4090 GPUs Jan 10, 2025
@ethanhe42
Copy link
Member

hi @ztianlin the autoregressive finetuning only requires 2 A100/H100 https://github.com/NVIDIA/Cosmos/tree/main/cosmos1/models/autoregressive/nemo/post_training

@ymcki
Copy link

ymcki commented Jan 11, 2025

hi @ztianlin the autoregressive finetuning only requires 2 A100/H100 https://github.com/NVIDIA/Cosmos/tree/main/cosmos1/models/autoregressive/nemo/post_training

Two is still 160GB. Will DIGITS's 128GB be enough?

@ztianlin
Copy link
Author

hi @ztianlin the autoregressive finetuning only requires 2 A100/H100 https://github.com/NVIDIA/Cosmos/tree/main/cosmos1/models/autoregressive/nemo/post_training

Thanks! And what about diffusion models? I really wish that one can post train diffusion models on 4090

@jpenningCA
Copy link

@ztianlin I'm a PM at NVIDIA for COSMOS. Can you share why post training training on a 4090 is important?

@monko9j1
Copy link

@jpenningCA @ethanhe42
Do accurate benchmarks exist for VRAM usage across different models? For instance, could a setup with seven RTX 4090s work effectively? Specifically, I’d like to know how much VRAM is required to train the 7B and 14B models, both for Text2World and the upcoming Video2World model post training.

Currently, the documentation states:
https://github.com/NVIDIA/Cosmos/blob/main/cosmos1/models/diffusion/nemo/post_training/README.md

“8 NVIDIA GPUs*”

However, this doesn’t provide much clarity. It seems reasonable to infer that 8 A100s or H100s were likely used for the 7B and 14B models, but is that level of hardware strictly necessary from a VRAM perspective? What would be the minimum recommended VRAM requirements?

Additionally, the README describes that training uses NeMo Framework's data and model parallelism capabilities, specifically mentioning Fully Sharded Data Parallel (FSDP) and Tensor Parallelism. This suggests that parameters, optimizer states, and activations are distributed across all GPUs, and individual layer parameter tensors are also spread across GPUs.

Given this information:

  1. Is the requirement for 8 GPUs primarily driven by the need to distribute computational tasks (via FSDP and Tensor Parallelism), or is it mainly due to VRAM limitations per GPU?
  2. Could configurations with fewer GPUs, such as seven RTX 4090s, be viable if VRAM is sufficient, or is the parallelism tightly integrated with the current 8-GPU setup recommendations?

Understanding these details would help determine if alternative hardware configurations could work for post-training these models.

@StarsTesla
Copy link

@ztianlin I'm a PM at NVIDIA for COSMOS. Can you share why post training training on a 4090 is important?

It's like the question, why we need LLM on iPhone? I think maybe most customers have 4090, instead of expensive A100/H100.

@ztianlin
Copy link
Author

ztianlin commented Jan 24, 2025

@ztianlin I'm a PM at NVIDIA for COSMOS. Can you share why post training training on a 4090 is important?

As metaphorically described by @StarsTesla, my research resources are limited. I believe if one can easily train on 4090, the cosmos community will become larger and more active.

@mharrim mharrim added the enhancement New feature or request label Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants