Add DeepSeek V2 config #1293

RissyRan · 2025-02-21T04:01:31Z

Description

Add DeepSeek V2 config for fast development:

V2-16b: https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite
V2-236b: https://huggingface.co/deepseek-ai/DeepSeek-V2

Tests

V2-16b:

python3 MaxText/train.py MaxText/configs/base.yml base_output_directory=/tmp/ run_name=deepsee_training per_device_batch_size=4 enable_checkpointing=false model_name=deepseek2-16b ici_fsdp_parallelism=4 steps=5 async_checkpointing=false tokenizer_path=deepseek-ai/DeepSeek-V2-Lite attention=dot_product dtype=bfloat16 weight_dtype=bfloat16 dataset_type=synthetic sparse_matmul=True megablox=True

Small config of V2-236b:

python3 MaxText/train.py MaxText/configs/base.yml base_output_directory=/tmp/ run_name=deepsee_training per_device_batch_size=4 enable_checkpointing=false model_name=deepseek2-236b ici_fsdp_parallelism=4 steps=5 async_checkpointing=false tokenizer_path=deepseek-ai/DeepSeek-V2 attention=dot_product dtype=bfloat16 weight_dtype=bfloat16 dataset_type=synthetic sparse_matmul=True megablox=True

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

RissyRan · 2025-02-21T04:05:50Z

Hi @gagika, I will meet error running v2-lite config as the q_lora_rank=0. It shows the embedding dim is different to H from here. The embedding_dim=64, while inputs's last dim is 0. Full logs. Did I miss anything here?

If this is expected in the current code, we may need a workaround, otherwise, this branch could be useless.

gagika · 2025-02-21T06:57:50Z

Hi @gagika, I will meet error running v2-lite config as the q_lora_rank=0. It shows the embedding dim is different to H from here. The embedding_dim=64, while inputs's last dim is 0. Full logs. Did I miss anything here?

If this is expected in the current code, we may need a workaround, otherwise, this branch could be useless.

Hi Ran, that brunch wasn't tested, there is a bug, could you please change the features=(self.num_query_heads, self.head_dim), to features=(self.num_query_heads, self.qk_head_dim),

    if self.q_lora_rank == 0:
      # Standard Q projection (without LoRA).
      self.query_proj = DenseGeneral(
          features=(self.num_query_heads, self.qk_head_dim),
          axis=-1,
          kernel_init=self.kernel_init,
          kernel_axes=("embed", "q_heads", "kv"),
          dtype=self.dtype,
          weight_dtype=self.weight_dtype,
          name="query",
          quant=self.quant,
          matmul_precision=self.config.matmul_precision,
      )

Please add the fix in your PR or I can sent a PR with that one line fix, if you prefer that way.

RissyRan · 2025-02-21T17:41:13Z

Hi @gagika, I will meet error running v2-lite config as the q_lora_rank=0. It shows the embedding dim is different to H from here. The embedding_dim=64, while inputs's last dim is 0. Full logs. Did I miss anything here?
If this is expected in the current code, we may need a workaround, otherwise, this branch could be useless.

Hi Ran, that brunch wasn't tested, there is a bug, could you please change the features=(self.num_query_heads, self.head_dim), to features=(self.num_query_heads, self.qk_head_dim),
    if self.q_lora_rank == 0:
      # Standard Q projection (without LoRA).
      self.query_proj = DenseGeneral(
          features=(self.num_query_heads, self.qk_head_dim),
          axis=-1,
          kernel_init=self.kernel_init,
          kernel_axes=("embed", "q_heads", "kv"),
          dtype=self.dtype,
          weight_dtype=self.weight_dtype,
          name="query",
          quant=self.quant,
          matmul_precision=self.config.matmul_precision,
      )
Please add the fix in your PR or I can sent a PR with that one line fix, if you prefer that way.

Cool! Let me add this one line change here.

RissyRan · 2025-02-21T18:04:02Z

Both tests are working now. V2lite-16b with q_lora_rank=0, and v2-236b with q_lora_rank=1536.

gagika

SGTM, Thanks Ran!

RissyRan requested review from gobbleturk, khatwanimohit, bvandermoon and vipannalla as code owners February 21, 2025 04:01

RissyRan assigned gagika Feb 21, 2025

RissyRan force-pushed the deepseek_v2 branch from c99d1bc to e384138 Compare February 21, 2025 17:56

Add DeepSeek V2 config

a1ac837

RissyRan force-pushed the deepseek_v2 branch from e384138 to a1ac837 Compare February 21, 2025 17:57

RissyRan assigned gobbleturk and khatwanimohit Feb 21, 2025

gagika approved these changes Feb 21, 2025

View reviewed changes

github-actions bot added the pull ready label Feb 21, 2025

gobbleturk approved these changes Feb 21, 2025

View reviewed changes

RissyRan unassigned gagika, gobbleturk and khatwanimohit Feb 21, 2025

copybara-service bot merged commit 8632dcb into main Feb 21, 2025
18 checks passed

copybara-service bot deleted the deepseek_v2 branch February 21, 2025 19:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DeepSeek V2 config #1293

Add DeepSeek V2 config #1293

RissyRan commented Feb 21, 2025 •

edited

Loading

RissyRan commented Feb 21, 2025 •

edited

Loading

gagika commented Feb 21, 2025 •

edited

Loading

RissyRan commented Feb 21, 2025

RissyRan commented Feb 21, 2025

gagika left a comment

Add DeepSeek V2 config #1293

Add DeepSeek V2 config #1293

Conversation

RissyRan commented Feb 21, 2025 • edited Loading

Description

Tests

Checklist

RissyRan commented Feb 21, 2025 • edited Loading

gagika commented Feb 21, 2025 • edited Loading

RissyRan commented Feb 21, 2025

RissyRan commented Feb 21, 2025

gagika left a comment

Choose a reason for hiding this comment

RissyRan commented Feb 21, 2025 •

edited

Loading

RissyRan commented Feb 21, 2025 •

edited

Loading

gagika commented Feb 21, 2025 •

edited

Loading