Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

empty tensors when using DeMo with FSDP #2

Open
peter-sk opened this issue Jan 6, 2025 · 0 comments
Open

empty tensors when using DeMo with FSDP #2

peter-sk opened this issue Jan 6, 2025 · 0 comments

Comments

@peter-sk
Copy link

peter-sk commented Jan 6, 2025

I am successfully training 1B OLMo models using DeMo with DDP :-)

However, when training the same model using DeMo with FSDP, I run into an issue where both _dct and _idct are called with empty tensors (x.shape[-1] == 0).

My config for DeMo as the optimizer is:

optimizer:
name: demo
learning_rate: 3.0e-4
weight_decay: 0.0
eps: 1e-8
decay_norm_and_bias: false
decay_embeddings: false
compression_decay: 0.99
compression_topk: 32
compression_chunk: 64
metrics_log_interval: 1

My config for FSDP is:

fsdp:
wrapping_strategy: null
precision: mixed
disable_grad_sync: true

My code base is in the following PR to OLMo:

allenai/OLMo#771

Traceback (most recent call last):
File "/leonardo_scratch/fast/EUHPC_D07_063/OLMo/scripts/train.py", line 429, in
main(cfg)
File "/leonardo_scratch/fast/EUHPC_D07_063/OLMo/scripts/train.py", line 248, in main
optim = build_optimizer(cfg, dist_model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/leonardo_scratch/fast/EUHPC_D07_063/miniconda3/envs/olmo/lib/python3.12/site-packages/olmo/optim.py", line 1138, in build_optimizer
return DeMo(
^^^^^
File "/leonardo_scratch/fast/EUHPC_D07_063/miniconda3/envs/olmo/lib/python3.12/site-packages/olmo/optim.py", line 702, in init
self.transform = TransformDCT(self.param_groups, self.compression_chunk)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/leonardo_scratch/fast/EUHPC_D07_063/miniconda3/envs/olmo/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/leonardo_scratch/fast/EUHPC_D07_063/miniconda3/envs/olmo/lib/python3.12/site-packages/olmo/demo_util.py", line 32, in init
self.f_dict[sc] = _dct(I, norm=norm).to(p.dtype).to(p.device)
^^^^^^^^^^^^^^^^^^
File "/leonardo_scratch/fast/EUHPC_D07_063/miniconda3/envs/olmo/lib/python3.12/site-packages/olmo/demo_util.py", line 169, in _dct
x = x.contiguous().view(-1, N)
^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant