fix: return float instead of tensor from `get_rotary_seq_len` #1419

jasonchiu-codeium · 2025-02-20T03:16:05Z

The below error stacktrace occurs due to get_rotary_seq_len potentially returning a tensor value instead of its float value.

[...]

  File "/ephemeral/devcontainer/jasonchiu/cache/bazel/_bazel_jasonchiu/996a28cb1c2af162dca7531bd6a2de53/execroot/_main/bazel-out/k8-opt/bin/exa/trainer/megatron/megatron_trainer_test.runfiles/local_pip_torch/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/ephemeral/devcontainer/jasonchiu/cache/bazel/_bazel_jasonchiu/996a28cb1c2af162dca7531bd6a2de53/execroot/_main/bazel-out/k8-opt/bin/exa/trainer/megatron/megatron_trainer_test.runfiles/local_pip_torch/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/ephemeral/devcontainer/jasonchiu/cache/bazel/_bazel_jasonchiu/996a28cb1c2af162dca7531bd6a2de53/execroot/_main/bazel-out/k8-opt/bin/exa/trainer/megatron/megatron_trainer_test.runfiles/_main/third_party/megatron_lm/Megatron-LM/megatron/core/distributed/data_parallel_base.py", line 22, in forward
    return self.module(*inputs, **kwargs)
  File "/ephemeral/devcontainer/jasonchiu/cache/bazel/_bazel_jasonchiu/996a28cb1c2af162dca7531bd6a2de53/execroot/_main/bazel-out/k8-opt/bin/exa/trainer/megatron/megatron_trainer_test.runfiles/local_pip_torch/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/ephemeral/devcontainer/jasonchiu/cache/bazel/_bazel_jasonchiu/996a28cb1c2af162dca7531bd6a2de53/execroot/_main/bazel-out/k8-opt/bin/exa/trainer/megatron/megatron_trainer_test.runfiles/local_pip_torch/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/ephemeral/devcontainer/jasonchiu/cache/bazel/_bazel_jasonchiu/996a28cb1c2af162dca7531bd6a2de53/execroot/_main/bazel-out/k8-opt/bin/exa/trainer/megatron/megatron_trainer_test.runfiles/_main/third_party/megatron_lm/Megatron-LM/megatron/core/transformer/module.py", line 178, in forward
    outputs = self.module(*inputs, **kwargs)
  File "/ephemeral/devcontainer/jasonchiu/cache/bazel/_bazel_jasonchiu/996a28cb1c2af162dca7531bd6a2de53/execroot/_main/bazel-out/k8-opt/bin/exa/trainer/megatron/megatron_trainer_test.runfiles/local_pip_torch/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/ephemeral/devcontainer/jasonchiu/cache/bazel/_bazel_jasonchiu/996a28cb1c2af162dca7531bd6a2de53/execroot/_main/bazel-out/k8-opt/bin/exa/trainer/megatron/megatron_trainer_test.runfiles/local_pip_torch/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/ephemeral/devcontainer/jasonchiu/cache/bazel/_bazel_jasonchiu/996a28cb1c2af162dca7531bd6a2de53/execroot/_main/bazel-out/k8-opt/bin/exa/trainer/megatron/megatron_trainer_test.runfiles/_main/third_party/megatron_lm/Megatron-LM/megatron/core/models/gpt/gpt_model.py", line 265, in forward
    rotary_pos_emb = self.rotary_pos_emb(
  File "/ephemeral/devcontainer/jasonchiu/cache/bazel/_bazel_jasonchiu/996a28cb1c2af162dca7531bd6a2de53/execroot/_main/bazel-out/k8-opt/bin/exa/trainer/megatron/megatron_trainer_test.runfiles/local_pip_torch/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/ephemeral/devcontainer/jasonchiu/cache/bazel/_bazel_jasonchiu/996a28cb1c2af162dca7531bd6a2de53/execroot/_main/bazel-out/k8-opt/bin/exa/trainer/megatron/megatron_trainer_test.runfiles/local_pip_torch/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/ephemeral/devcontainer/jasonchiu/cache/bazel/_bazel_jasonchiu/996a28cb1c2af162dca7531bd6a2de53/execroot/_main/bazel-out/k8-opt/bin/exa/trainer/megatron/megatron_trainer_test.runfiles/_main/third_party/megatron_lm/Megatron-LM/megatron/core/models/common/embeddings/rotary_pos_embedding.py", line 172, in forward
    freqs = self.get_freqs_non_repeated(max_seq_len, offset)
  File "/ephemeral/devcontainer/jasonchiu/cache/bazel/_bazel_jasonchiu/996a28cb1c2af162dca7531bd6a2de53/execroot/_main/bazel-out/k8-opt/bin/exa/trainer/megatron/megatron_trainer_test.runfiles/_main/third_party/megatron_lm/Megatron-LM/megatron/core/models/common/embeddings/rotary_pos_embedding.py", line 137, in get_freqs_non_repeated
    torch.arange(max_seq_len, device=self.inv_freq.device, dtype=self.inv_freq.dtype)

This PR fixes that

fix: return float instead of tensor from get_rotary_seq_len

20a15a5

jasonchiu-codeium mentioned this pull request Feb 20, 2025

[BUG] get_rotary_seq_len isn't always returning a float #1420

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: return float instead of tensor from `get_rotary_seq_len` #1419

fix: return float instead of tensor from `get_rotary_seq_len` #1419

jasonchiu-codeium commented Feb 20, 2025

fix: return float instead of tensor from get_rotary_seq_len #1419

Are you sure you want to change the base?

fix: return float instead of tensor from get_rotary_seq_len #1419

Conversation

jasonchiu-codeium commented Feb 20, 2025

fix: return float instead of tensor from `get_rotary_seq_len` #1419

fix: return float instead of tensor from `get_rotary_seq_len` #1419