Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DeepSeek-V2 support #2763

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Add DeepSeek-V2 support #2763

wants to merge 20 commits into from

Conversation

lzhangzz
Copy link
Collaborator

@lzhangzz lzhangzz commented Nov 17, 2024

  • Initial support for DeepSeek-V2 & DeepSeek-V2-Lite in turbomind for sm80+
  • No MLA optimizations, MLA is implemented as MHA in current implementation
  • Tested with KV16/KV8
  • Greatly improved initialization time for large models

Known issue

  • Loading of remote safetensors models is slow (existing, not introduced by this PR)

@lzhangzz lzhangzz added the WIP label Nov 17, 2024
@silentlustre
Copy link

!!! ❤️
excited for this!

@lvhan028 lvhan028 added the enhancement New feature or request label Nov 19, 2024
@lzhangzz lzhangzz removed the WIP label Nov 20, 2024
@lvhan028 lvhan028 self-requested a review November 20, 2024 14:53
@zhulinJulia24
Copy link
Collaborator

@lzhangzz
image
精度部分指标明显不正常,查看了mmlu的推理结果,指令跟随不正确导致结果无法提取
c25d26a5-8395-4ea2-b829-70efd098c36f

@zhulinJulia24
Copy link
Collaborator

@lzhangzz Mixtral-8x7B-Instruct-v0.1 dump core!

lmdeploy chat /nvme/qa_test_models/mistralai/Mixtral-8x7B-Instruct-v0.1 --tp 2
chat_template_config:
ChatTemplateConfig(model_name='mistral', system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability='chat', stop_words=None)
engine_cfg:
TurbomindEngineConfig(dtype='auto', model_format=None, tp=2, session_len=32768, max_batch_size=1, cache_max_entry_count=0.8, cache_chunk_size=-1, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1)
[WARNING] gemm_config.in is not found; using default GEMM algo                                                                                                                                                    
[WARNING] gemm_config.in is not found; using default GEMM algo
Segmentation fault (core dumped)

@zhulinJulia24
Copy link
Collaborator

@lzhangzz Evaluation's output of Mixtral-8x7B-Instruct-v0.1 seems not correct, so many repeated words

image

the input is
Follow the given examples and answer the question.\nDetermine which of two sentences is sarcastic.\n\nAccording to Cambridge University Dictionary, sarcasm is \"the use of remarks that clearly mean the opposite of what they say, made in order to hurt someone's feelings or to criticize something in a humorous way.\" Sarcastic sentences often contain satirical or ironic utterances, hyperboles, ambivalent or witty remarks.\n\nQ: Which statement is sarcastic?\nOptions:\n(A) Yes, because having interests and actively researching them is a huge waste\n(B) Yes, because having interests and actively researching them is a huge deal\nA: Let's think step by step.\nIf we look at (A), it says that having interests and actively researching them is a huge waste, implying that it is a useless effort. However, we know that having interests and actively researching them is typically not a waste but rather is beneficial to the individual. The presence of such a juxtaposition in (A) suggests that it contains a taste of irony and sarcasm.\nIf we look at (B), it says that having interests and actively researching them is a huge deal, implying that it is an important and consequential effort. This is arguably a neutral and correct statement.\nAbove the above, the sarcastic option is (A). So the answer is (A).\n\nQ: Which statement is sarcastic?\nOptions:\n(A) No one is going to disagree with you on this. Avoiding ad hominem attacks really help your case\n(B) No one is going to disagree with you on this. Ad hominem attacks really help your case\nA: Let's think step by step.\nIf we look at (A), it says that avoiding ad hominem attacks really help your case, implying that ad hominem attacks are adverse and injurious. Because ad hominem attacks are adressed at a person rather than an idea, it is indeed true that avoiding them is often useful and helpful; so, (A) is a neutral (valid and agreeable) statement.\nIf we look at (B), it says that ad hominem attacks really help your case, implying that ad hominem attacks are a positive thing. However, we stated previously that ad hominem attacks are often not useful or constructive. The speaker in this sentence therefore seems to mean the opposite of what they are saying; so, there appears to have a taste of irony and sarcasm in (B).\nAbove the above, the sarcastic option is (B). So the answer is (B).\n\nQ: Which statement is sarcastic?\nOptions:\n(A) Consistency in the league's punishments? What do you think this is supposed to be, politics?\n(B) Consistency in the league's punishments? What do you think this is supposed to be, moral?\nA: Let's think step by step.\nIf we look at (A), it likens the consistency in the league's punishments with that in politics. Because politics or political affairs are often not considered to be consistent or dependable, this sentence appears to be satirical.\nIf we look at (B), it likens the consistency in the league's punishments with that in morality. Discussing the consistency of the league's punishments in the context of morality, ethics, or law makes sense and does not appear to make a satirical point about anything.\nAbove the above, the sarcastic option is (A). So the answer is (A).\n\nQ: Which statement is sarcastic?\nOptions:\n(A) Yeah, I've noticed he's been drinking every day recently, what a great transformation\n(B) Yeah, I've noticed he's been exercising every day recently, what a great transformation\nA: Let's think step by step.\n

@irexyc
Copy link
Collaborator

irexyc commented Nov 27, 2024

info['moe_norm_topk_prob'] = cfg['norm_topk_prob']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants