-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM for Mistral-Nemo-Base-2407 with NeMo + ThunderFX for input sequence lengths working for NeMo Eager #1475
Comments
As Tom suggested. the issue might be caused by this PR: #1400 |
@tfogal, why do you think that pull request could cause problems? |
I had git bisected and found that 052bac3 (the commit right before it) works well. I am confused as to what in there is an issue, though; experimenting now, but my initial theory about pruning too aggressively doesn't hold water, so I'm not so sure at the moment. |
@tfogal, what command did you run for bisection? I get an OOM error on H100 with the commit 052bac3 and 2k sequence length as in the issue description
The same OOM error is with the linked commit, and before that, but a different error appears with the commit right after (a617503), which breaks memory consumption because a deepcopy of an fx.GraphModule also creates copies for all parameters. With the 1k sequence here are the memory consumptions:
@kshitij12345, looks like the problem was introduced in #1400. |
Interestingly, import torch
import copy
# copy.deepcopy leads to more memory usage (as modules with parameters are saved in GraphModule).
# Eg.
# class GraphModule(torch.nn.Module):
# def forward(self, L_args_0_: "f32[1024, 1024]"):
# l_args_0_ = L_args_0_
# # File: /home/kkalambarkar/git/pytorch/torch/_dynamo/external_utils.py:31 in inner, code: return fn(*args, **kwargs)
# fn_0: "f32[1024, 1024]" = self.fn_0(l_args_0_); l_args_0_ = None
# fn_1: "f32[1024, 1024]" = self.fn_1(fn_0); fn_0 = None
# return (fn_1,)
torch._dynamo.config.inline_inbuilt_nn_modules=False
# class GraphModule(torch.nn.Module):
# def forward(self, L_fn_modules_0_parameters_weight_: "f32[1024, 1024]", L_fn_modules_0_parameters_bias_: "f32[1024]", L_args_0_: "f32[1024, 1024]", L_fn_modules_1_parameters_weight_: "f32[1024, 1024]", L_fn_modules_1_parameters_bias_: "f32[1024]"):
# l_fn_modules_0_parameters_weight_ = L_fn_modules_0_parameters_weight_
# l_fn_modules_0_parameters_bias_ = L_fn_modules_0_parameters_bias_
# l_args_0_ = L_args_0_
# l_fn_modules_1_parameters_weight_ = L_fn_modules_1_parameters_weight_
# l_fn_modules_1_parameters_bias_ = L_fn_modules_1_parameters_bias_
# # File: /home/kkalambarkar/git/pytorch/torch/_dynamo/external_utils.py:31 in inner, code: return fn(*args, **kwargs)
# input_1: "f32[1024, 1024]" = torch._C._nn.linear(l_args_0_, l_fn_modules_0_parameters_weight_, l_fn_modules_0_parameters_bias_); l_args_0_ = l_fn_modules_0_parameters_weight_ = l_fn_modules_0_parameters_bias_ = None
# input_2: "f32[1024, 1024]" = torch._C._nn.linear(input_1, l_fn_modules_1_parameters_weight_, l_fn_modules_1_parameters_bias_); input_1 = l_fn_modules_1_parameters_weight_ = l_fn_modules_1_parameters_bias_ = None
# return (input_2,)
torch._dynamo.config.inline_inbuilt_nn_modules=True
gm_copy = None
def backend(gm, sample_args):
global gm_copy
gm_copy = copy.deepcopy(gm)
gm.print_readable()
return gm
with torch.device("cuda"):
models = torch.nn.Sequential(torch.nn.Linear(1024, 1024), torch.nn.Linear(1024, 1024))
opt_model = torch.compile(models, backend=backend)
x = torch.randn(1024, 1024, device="cuda")
opt_model(x)
print(torch.cuda.memory_allocated()) # no_inline = 29507584, inline = 21110784
del opt_model, models
print(torch.cuda.memory_allocated()) # no_inline = 29507584, inline = 12713984
del gm_copy
print(torch.cuda.memory_allocated()) # no_inline = 29507584, inline = 12713984 |
Cool, so as long as |
🐛 Bug
When running Mistral-Nemo-Base-2407 with NeMo + ThunderFX even for small sequence lengths we get OOM error.
To Reproduce
The error is present on 1xH100.
Dockerfile used (I build it yesterday and I'm not sure yet how nemo:dev images are versioned, so I can't provide its detailed version):
Inside docker container please run:
Script
bench_targets/llm_peft/_nemo.py
can be obtained from internal Gitlab fromakoumparouli/nemo_bench
. You can contant me or @tfogal if you have any questions.You can check that the command below works:
Expected behavior
We should be able to run at least the same sequence length as NeMo Eager.
Environment
cc @tfogal
The text was updated successfully, but these errors were encountered: