Skip to content

Latest commit

 

History

History
713 lines (705 loc) · 194 KB

支持的模型和数据集.md

File metadata and controls

713 lines (705 loc) · 194 KB

支持的模型和数据集

目录

模型

下表介绍了swift介入的模型的相关信息:

  • Model List: 模型在swift中注册的model_type的列表.
  • Default Lora Target Modules: 对应模型的默认lora_target_modules.
  • Default Template: 对应模型的默认template.
  • Support Flash Attn: 模型是否支持flash attention加速推理和微调.
  • Support VLLM: 模型是否支持vllm加速推理和部署.
  • Requires: 对应模型所需的额外依赖要求.

大语言模型

Model Type Model ID Default Lora Target Modules Default Template Support Flash Attn Support vLLM Support LMDeploy Support Megatron Requires Tags HF Model ID
qwen-1_8b qwen/Qwen-1_8B c_attn default-generation - Qwen/Qwen-1_8B
qwen-1_8b-chat qwen/Qwen-1_8B-Chat c_attn qwen - Qwen/Qwen-1_8B-Chat
qwen-1_8b-chat-int4 qwen/Qwen-1_8B-Chat-Int4 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-1_8B-Chat-Int4
qwen-1_8b-chat-int8 qwen/Qwen-1_8B-Chat-Int8 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-1_8B-Chat-Int8
qwen-7b qwen/Qwen-7B c_attn default-generation - Qwen/Qwen-7B
qwen-7b-chat qwen/Qwen-7B-Chat c_attn qwen - Qwen/Qwen-7B-Chat
qwen-7b-chat-int4 qwen/Qwen-7B-Chat-Int4 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-7B-Chat-Int4
qwen-7b-chat-int8 qwen/Qwen-7B-Chat-Int8 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-7B-Chat-Int8
qwen-14b qwen/Qwen-14B c_attn default-generation - Qwen/Qwen-14B
qwen-14b-chat qwen/Qwen-14B-Chat c_attn qwen - Qwen/Qwen-14B-Chat
qwen-14b-chat-int4 qwen/Qwen-14B-Chat-Int4 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-14B-Chat-Int4
qwen-14b-chat-int8 qwen/Qwen-14B-Chat-Int8 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-14B-Chat-Int8
qwen-72b qwen/Qwen-72B c_attn default-generation - Qwen/Qwen-72B
qwen-72b-chat qwen/Qwen-72B-Chat c_attn qwen - Qwen/Qwen-72B-Chat
qwen-72b-chat-int4 qwen/Qwen-72B-Chat-Int4 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-72B-Chat-Int4
qwen-72b-chat-int8 qwen/Qwen-72B-Chat-Int8 c_attn qwen auto_gptq>=0.5 - Qwen/Qwen-72B-Chat-Int8
modelscope-agent-7b iic/ModelScope-Agent-7B c_attn modelscope-agent - -
modelscope-agent-14b iic/ModelScope-Agent-14B c_attn modelscope-agent - -
qwen1half-0_5b qwen/Qwen1.5-0.5B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-0.5B
qwen1half-1_8b qwen/Qwen1.5-1.8B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-1.8B
qwen1half-4b qwen/Qwen1.5-4B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-4B
qwen1half-7b qwen/Qwen1.5-7B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-7B
qwen1half-14b qwen/Qwen1.5-14B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-14B
qwen1half-32b qwen/Qwen1.5-32B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-32B
qwen1half-72b qwen/Qwen1.5-72B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-72B
qwen1half-110b qwen/Qwen1.5-110B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen1.5-110B
codeqwen1half-7b qwen/CodeQwen1.5-7B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/CodeQwen1.5-7B
qwen1half-moe-a2_7b qwen/Qwen1.5-MoE-A2.7B q_proj, k_proj, v_proj default-generation transformers>=4.40 moe Qwen/Qwen1.5-MoE-A2.7B
qwen1half-0_5b-chat qwen/Qwen1.5-0.5B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-0.5B-Chat
qwen1half-1_8b-chat qwen/Qwen1.5-1.8B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-1.8B-Chat
qwen1half-4b-chat qwen/Qwen1.5-4B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-4B-Chat
qwen1half-7b-chat qwen/Qwen1.5-7B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-7B-Chat
qwen1half-14b-chat qwen/Qwen1.5-14B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-14B-Chat
qwen1half-32b-chat qwen/Qwen1.5-32B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-32B-Chat
qwen1half-72b-chat qwen/Qwen1.5-72B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-72B-Chat
qwen1half-110b-chat qwen/Qwen1.5-110B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen1.5-110B-Chat
qwen1half-moe-a2_7b-chat qwen/Qwen1.5-MoE-A2.7B-Chat q_proj, k_proj, v_proj qwen transformers>=4.40 moe Qwen/Qwen1.5-MoE-A2.7B-Chat
codeqwen1half-7b-chat qwen/CodeQwen1.5-7B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/CodeQwen1.5-7B-Chat
qwen1half-0_5b-chat-int4 qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4
qwen1half-1_8b-chat-int4 qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4
qwen1half-4b-chat-int4 qwen/Qwen1.5-4B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-4B-Chat-GPTQ-Int4
qwen1half-7b-chat-int4 qwen/Qwen1.5-7B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-7B-Chat-GPTQ-Int4
qwen1half-14b-chat-int4 qwen/Qwen1.5-14B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-14B-Chat-GPTQ-Int4
qwen1half-32b-chat-int4 qwen/Qwen1.5-32B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-32B-Chat-GPTQ-Int4
qwen1half-72b-chat-int4 qwen/Qwen1.5-72B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-72B-Chat-GPTQ-Int4
qwen1half-110b-chat-int4 qwen/Qwen1.5-110B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-110B-Chat-GPTQ-Int4
qwen1half-0_5b-chat-int8 qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8
qwen1half-1_8b-chat-int8 qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8
qwen1half-4b-chat-int8 qwen/Qwen1.5-4B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-4B-Chat-GPTQ-Int8
qwen1half-7b-chat-int8 qwen/Qwen1.5-7B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-7B-Chat-GPTQ-Int8
qwen1half-14b-chat-int8 qwen/Qwen1.5-14B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-14B-Chat-GPTQ-Int8
qwen1half-72b-chat-int8 qwen/Qwen1.5-72B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen1.5-72B-Chat-GPTQ-Int8
qwen1half-moe-a2_7b-chat-int4 qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.40 moe Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4
qwen1half-0_5b-chat-awq qwen/Qwen1.5-0.5B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-0.5B-Chat-AWQ
qwen1half-1_8b-chat-awq qwen/Qwen1.5-1.8B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-1.8B-Chat-AWQ
qwen1half-4b-chat-awq qwen/Qwen1.5-4B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-4B-Chat-AWQ
qwen1half-7b-chat-awq qwen/Qwen1.5-7B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-7B-Chat-AWQ
qwen1half-14b-chat-awq qwen/Qwen1.5-14B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-14B-Chat-AWQ
qwen1half-32b-chat-awq qwen/Qwen1.5-32B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-32B-Chat-AWQ
qwen1half-72b-chat-awq qwen/Qwen1.5-72B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-72B-Chat-AWQ
qwen1half-110b-chat-awq qwen/Qwen1.5-110B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen1.5-110B-Chat-AWQ
codeqwen1half-7b-chat-awq qwen/CodeQwen1.5-7B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/CodeQwen1.5-7B-Chat-AWQ
qwen2-0_5b qwen/Qwen2-0.5B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2-0.5B
qwen2-0_5b-instruct qwen/Qwen2-0.5B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-0.5B-Instruct
qwen2-0_5b-instruct-int4 qwen/Qwen2-0.5B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4
qwen2-0_5b-instruct-int8 qwen/Qwen2-0.5B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8
qwen2-0_5b-instruct-awq qwen/Qwen2-0.5B-Instruct-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen2-0.5B-Instruct-AWQ
qwen2-1_5b qwen/Qwen2-1.5B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2-1.5B
qwen2-1_5b-instruct qwen/Qwen2-1.5B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-1.5B-Instruct
qwen2-1_5b-instruct-int4 qwen/Qwen2-1.5B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4
qwen2-1_5b-instruct-int8 qwen/Qwen2-1.5B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-1_5B-Instruct-GPTQ-Int8
qwen2-1_5b-instruct-awq qwen/Qwen2-1.5B-Instruct-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen2-1.5B-Instruct-AWQ
qwen2-7b qwen/Qwen2-7B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2-7B
qwen2-7b-instruct qwen/Qwen2-7B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-7B-Instruct
qwen2-7b-instruct-int4 qwen/Qwen2-7B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-7B-Instruct-GPTQ-Int4
qwen2-7b-instruct-int8 qwen/Qwen2-7B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-7B-Instruct-GPTQ-Int8
qwen2-7b-instruct-awq qwen/Qwen2-7B-Instruct-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen2-7B-Instruct-AWQ
qwen2-72b qwen/Qwen2-72B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2-72B
qwen2-72b-instruct qwen/Qwen2-72B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-72B-Instruct
qwen2-72b-instruct-int4 qwen/Qwen2-72B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-72B-Instruct-GPTQ-Int4
qwen2-72b-instruct-int8 qwen/Qwen2-72B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2-72B-Instruct-GPTQ-Int8
qwen2-72b-instruct-awq qwen/Qwen2-72B-Instruct-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq - Qwen/Qwen2-72B-Instruct-AWQ
qwen2-57b-a14b qwen/Qwen2-57B-A14B q_proj, k_proj, v_proj default-generation transformers>=4.40 moe Qwen/Qwen2-57B-A14B
qwen2-57b-a14b-instruct qwen/Qwen2-57B-A14B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.40 moe Qwen/Qwen2-57B-A14B-Instruct
qwen2-57b-a14b-instruct-int4 qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.40 moe Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4
qwen2-math-1_5b qwen/Qwen2-Math-1.5B q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-Math-1.5B
qwen2-math-1_5b-instruct qwen/Qwen2-Math-1.5B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-Math-1.5B-Instruct
qwen2-math-7b qwen/Qwen2-Math-7B q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-Math-7B
qwen2-math-7b-instruct qwen/Qwen2-Math-7B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-Math-7B-Instruct
qwen2-math-72b qwen/Qwen2-Math-72B q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-Math-72B
qwen2-math-72b-instruct qwen/Qwen2-Math-72B-Instruct q_proj, k_proj, v_proj qwen transformers>=4.37 - Qwen/Qwen2-Math-72B-Instruct
qwen2_5-0_5b qwen/Qwen2.5-0.5B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-0.5B
qwen2_5-1_5b qwen/Qwen2.5-1.5B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-1.5B
qwen2_5-3b qwen/Qwen2.5-3B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-3B
qwen2_5-7b qwen/Qwen2.5-7B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-7B
qwen2_5-14b qwen/Qwen2.5-14B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-14B
qwen2_5-32b qwen/Qwen2.5-32B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-32B
qwen2_5-72b qwen/Qwen2.5-72B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-72B
qwen2_5-0_5b-instruct qwen/Qwen2.5-0.5B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-0.5B-Instruct
qwen2_5-1_5b-instruct qwen/Qwen2.5-1.5B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-1.5B-Instruct
qwen2_5-3b-instruct qwen/Qwen2.5-3B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-3B-Instruct
qwen2_5-7b-instruct qwen/Qwen2.5-7B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-7B-Instruct
qwen2_5-14b-instruct qwen/Qwen2.5-14B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-14B-Instruct
qwen2_5-32b-instruct qwen/Qwen2.5-32B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-32B-Instruct
qwen2_5-72b-instruct qwen/Qwen2.5-72B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-72B-Instruct
qwen2_5-0_5b-instruct-gptq-int4 qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4
qwen2_5-1_5b-instruct-gptq-int4 qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4
qwen2_5-3b-instruct-gptq-int4 qwen/Qwen2.5-3B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4
qwen2_5-7b-instruct-gptq-int4 qwen/Qwen2.5-7B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4
qwen2_5-14b-instruct-gptq-int4 qwen/Qwen2.5-14B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4
qwen2_5-32b-instruct-gptq-int4 qwen/Qwen2.5-32B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4
qwen2_5-72b-instruct-gptq-int4 qwen/Qwen2.5-72B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4
qwen2_5-0_5b-instruct-gptq-int8 qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8
qwen2_5-1_5b-instruct-gptq-int8 qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8
qwen2_5-3b-instruct-gptq-int8 qwen/Qwen2.5-3B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8
qwen2_5-7b-instruct-gptq-int8 qwen/Qwen2.5-7B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8
qwen2_5-14b-instruct-gptq-int8 qwen/Qwen2.5-14B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8
qwen2_5-32b-instruct-gptq-int8 qwen/Qwen2.5-32B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8
qwen2_5-72b-instruct-gptq-int8 qwen/Qwen2.5-72B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj qwen2_5 auto_gptq>=0.5, transformers>=4.37 - Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8
qwen2_5-0_5b-instruct-awq qwen/Qwen2.5-0.5B-Instruct-AWQ q_proj, k_proj, v_proj qwen2_5 transformers>=4.37, autoawq - Qwen/Qwen2.5-0.5B-Instruct-AWQ
qwen2_5-1_5b-instruct-awq qwen/Qwen2.5-1.5B-Instruct-AWQ q_proj, k_proj, v_proj qwen2_5 transformers>=4.37, autoawq - Qwen/Qwen2.5-1.5B-Instruct-AWQ
qwen2_5-3b-instruct-awq qwen/Qwen2.5-3B-Instruct-AWQ q_proj, k_proj, v_proj qwen2_5 transformers>=4.37, autoawq - Qwen/Qwen2.5-3B-Instruct-AWQ
qwen2_5-7b-instruct-awq qwen/Qwen2.5-7B-Instruct-AWQ q_proj, k_proj, v_proj qwen2_5 transformers>=4.37, autoawq - Qwen/Qwen2.5-7B-Instruct-AWQ
qwen2_5-14b-instruct-awq qwen/Qwen2.5-14B-Instruct-AWQ q_proj, k_proj, v_proj qwen2_5 transformers>=4.37, autoawq - Qwen/Qwen2.5-14B-Instruct-AWQ
qwen2_5-32b-instruct-awq qwen/Qwen2.5-32B-Instruct-AWQ q_proj, k_proj, v_proj qwen2_5 transformers>=4.37, autoawq - Qwen/Qwen2.5-32B-Instruct-AWQ
qwen2_5-72b-instruct-awq qwen/Qwen2.5-72B-Instruct-AWQ q_proj, k_proj, v_proj qwen2_5 transformers>=4.37, autoawq - Qwen/Qwen2.5-72B-Instruct-AWQ
qwen2_5-math-1_5b qwen/Qwen2.5-Math-1.5B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-Math-1.5B
qwen2_5-math-7b qwen/Qwen2.5-Math-7B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-Math-7B
qwen2_5-math-72b qwen/Qwen2.5-Math-72B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-Math-72B
qwen2_5-math-1_5b-instruct qwen/Qwen2.5-Math-1.5B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-Math-1.5B-Instruct
qwen2_5-math-7b-instruct qwen/Qwen2.5-Math-7B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-Math-7B-Instruct
qwen2_5-math-72b-instruct qwen/Qwen2.5-Math-72B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-Math-72B-Instruct
qwen2_5-coder-1_5b qwen/Qwen2.5-Coder-1.5B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-Coder-1.5B
qwen2_5-coder-1_5b-instruct qwen/Qwen2.5-Coder-1.5B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-Coder-1.5B-Instruct
qwen2_5-coder-7b qwen/Qwen2.5-Coder-7B q_proj, k_proj, v_proj default-generation transformers>=4.37 - Qwen/Qwen2.5-Coder-7B
qwen2_5-coder-7b-instruct qwen/Qwen2.5-Coder-7B-Instruct q_proj, k_proj, v_proj qwen2_5 transformers>=4.37 - Qwen/Qwen2.5-Coder-7B-Instruct
chatglm2-6b ZhipuAI/chatglm2-6b query_key_value chatglm2 transformers<4.42 - THUDM/chatglm2-6b
chatglm2-6b-32k ZhipuAI/chatglm2-6b-32k query_key_value chatglm2 transformers<4.42 - THUDM/chatglm2-6b-32k
chatglm3-6b-base ZhipuAI/chatglm3-6b-base query_key_value chatglm-generation transformers<4.42 - THUDM/chatglm3-6b-base
chatglm3-6b ZhipuAI/chatglm3-6b query_key_value chatglm3 transformers<4.42 - THUDM/chatglm3-6b
chatglm3-6b-32k ZhipuAI/chatglm3-6b-32k query_key_value chatglm3 transformers<4.42 - THUDM/chatglm3-6b-32k
chatglm3-6b-128k ZhipuAI/chatglm3-6b-128k query_key_value chatglm3 transformers<4.42 - THUDM/chatglm3-6b-128k
codegeex2-6b ZhipuAI/codegeex2-6b query_key_value chatglm-generation transformers<4.34 coding THUDM/codegeex2-6b
glm4-9b ZhipuAI/glm-4-9b query_key_value chatglm-generation transformers>=4.42 - THUDM/glm-4-9b
glm4-9b-chat ZhipuAI/glm-4-9b-chat query_key_value chatglm4 transformers>=4.42 - THUDM/glm-4-9b-chat
glm4-9b-chat-1m ZhipuAI/glm-4-9b-chat-1m query_key_value chatglm4 transformers>=4.42 - THUDM/glm-4-9b-chat-1m
codegeex4-9b-chat ZhipuAI/codegeex4-all-9b query_key_value codegeex4 transformers<4.42 coding THUDM/codegeex4-all-9b
llama2-7b modelscope/Llama-2-7b-ms q_proj, k_proj, v_proj default-generation - meta-llama/Llama-2-7b-hf
llama2-7b-chat modelscope/Llama-2-7b-chat-ms q_proj, k_proj, v_proj llama - meta-llama/Llama-2-7b-chat-hf
llama2-13b modelscope/Llama-2-13b-ms q_proj, k_proj, v_proj default-generation - meta-llama/Llama-2-13b-hf
llama2-13b-chat modelscope/Llama-2-13b-chat-ms q_proj, k_proj, v_proj llama - meta-llama/Llama-2-13b-chat-hf
llama2-70b modelscope/Llama-2-70b-ms q_proj, k_proj, v_proj default-generation - meta-llama/Llama-2-70b-hf
llama2-70b-chat modelscope/Llama-2-70b-chat-ms q_proj, k_proj, v_proj llama - meta-llama/Llama-2-70b-chat-hf
llama2-7b-aqlm-2bit-1x16 AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf q_proj, k_proj, v_proj default-generation transformers>=4.38, aqlm, torch>=2.2.0 - ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf
llama3-8b LLM-Research/Meta-Llama-3-8B q_proj, k_proj, v_proj default-generation - meta-llama/Meta-Llama-3-8B
llama3-8b-instruct LLM-Research/Meta-Llama-3-8B-Instruct q_proj, k_proj, v_proj llama3 - meta-llama/Meta-Llama-3-8B-Instruct
llama3-8b-instruct-int4 swift/Meta-Llama-3-8B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj llama3 auto_gptq - study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int4
llama3-8b-instruct-int8 swift/Meta-Llama-3-8B-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj llama3 auto_gptq - study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int8
llama3-8b-instruct-awq swift/Meta-Llama-3-8B-Instruct-AWQ q_proj, k_proj, v_proj llama3 autoawq - study-hjt/Meta-Llama-3-8B-Instruct-AWQ
llama3-70b LLM-Research/Meta-Llama-3-70B q_proj, k_proj, v_proj default-generation - meta-llama/Meta-Llama-3-70B
llama3-70b-instruct LLM-Research/Meta-Llama-3-70B-Instruct q_proj, k_proj, v_proj llama3 - meta-llama/Meta-Llama-3-70B-Instruct
llama3-70b-instruct-int4 swift/Meta-Llama-3-70B-Instruct-GPTQ-Int4 q_proj, k_proj, v_proj llama3 auto_gptq - study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int4
llama3-70b-instruct-int8 swift/Meta-Llama-3-70b-Instruct-GPTQ-Int8 q_proj, k_proj, v_proj llama3 auto_gptq - study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int8
llama3-70b-instruct-awq swift/Meta-Llama-3-70B-Instruct-AWQ q_proj, k_proj, v_proj llama3 autoawq - study-hjt/Meta-Llama-3-70B-Instruct-AWQ
llama3_1-8b LLM-Research/Meta-Llama-3.1-8B q_proj, k_proj, v_proj default-generation transformers>=4.43 - meta-llama/Meta-Llama-3.1-8B
llama3_1-8b-instruct LLM-Research/Meta-Llama-3.1-8B-Instruct q_proj, k_proj, v_proj llama3 transformers>=4.43 - meta-llama/Meta-Llama-3.1-8B-Instruct
llama3_1-8b-instruct-awq LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 q_proj, k_proj, v_proj llama3 transformers>=4.43, autoawq - hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4
llama3_1-8b-instruct-gptq-int4 LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4 q_proj, k_proj, v_proj llama3 transformers>=4.43, auto_gptq - hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4
llama3_1-8b-instruct-bnb LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4 q_proj, k_proj, v_proj llama3 transformers>=4.43, bitsandbytes - hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4
llama3_1-70b LLM-Research/Meta-Llama-3.1-70B q_proj, k_proj, v_proj default-generation transformers>=4.43 - meta-llama/Meta-Llama-3.1-70B
llama3_1-70b-instruct LLM-Research/Meta-Llama-3.1-70B-Instruct q_proj, k_proj, v_proj llama3 transformers>=4.43 - meta-llama/Meta-Llama-3.1-70B-Instruct
llama3_1-70b-instruct-fp8 LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8 q_proj, k_proj, v_proj llama3 transformers>=4.43 - meta-llama/Meta-Llama-3.1-70B-Instruct-FP8
llama3_1-70b-instruct-awq LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 q_proj, k_proj, v_proj llama3 transformers>=4.43, autoawq - hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4
llama3_1-70b-instruct-gptq-int4 LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 q_proj, k_proj, v_proj llama3 transformers>=4.43, auto_gptq - hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4
llama3_1-70b-instruct-bnb LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit q_proj, k_proj, v_proj llama3 transformers>=4.43, bitsandbytes - unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit
llama3_1-405b LLM-Research/Meta-Llama-3.1-405B q_proj, k_proj, v_proj default-generation transformers>=4.43 - meta-llama/Meta-Llama-3.1-405B
llama3_1-405b-instruct LLM-Research/Meta-Llama-3.1-405B-Instruct q_proj, k_proj, v_proj llama3 transformers>=4.43 - meta-llama/Meta-Llama-3.1-405B-Instruct
llama3_1-405b-instruct-fp8 LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8 q_proj, k_proj, v_proj llama3 transformers>=4.43 - meta-llama/Meta-Llama-3.1-405B-Instruct-FP8
llama3_1-405b-instruct-awq LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4 q_proj, k_proj, v_proj llama3 transformers>=4.43, autoawq - hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4
llama3_1-405b-instruct-gptq-int4 LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4 q_proj, k_proj, v_proj llama3 transformers>=4.43, auto_gptq - hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4
llama3_1-405b-instruct-bnb LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4 q_proj, k_proj, v_proj llama3 transformers>=4.43, bitsandbytes - hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4
llama-3.1-nemotron-70B-instruct-hf AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF q_proj, k_proj, v_proj llama3 transformers>=4.43 - nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
llama3_2-1b LLM-Research/Llama-3.2-1B q_proj, k_proj, v_proj default-generation transformers>=4.45 - meta-llama/Llama-3.2-1B
llama3_2-1b-instruct LLM-Research/Llama-3.2-1B-Instruct q_proj, k_proj, v_proj llama3_2 transformers>=4.45 - meta-llama/Llama-3.2-1B-Instruct
llama3_2-3b LLM-Research/Llama-3.2-3B q_proj, k_proj, v_proj default-generation transformers>=4.45 - meta-llama/Llama-3.2-3B
llama3_2-3b-instruct LLM-Research/Llama-3.2-3B-Instruct q_proj, k_proj, v_proj llama3_2 transformers>=4.45 - meta-llama/Llama-3.2-3B-Instruct
reflection-llama_3_1-70b LLM-Research/Reflection-Llama-3.1-70B q_proj, k_proj, v_proj reflection transformers>=4.43 - mattshumer/Reflection-Llama-3.1-70B
longwriter-glm4-9b ZhipuAI/LongWriter-glm4-9b query_key_value chatglm4 transformers>=4.42 - THUDM/LongWriter-glm4-9b
longwriter-llama3_1-8b ZhipuAI/LongWriter-llama3.1-8b q_proj, k_proj, v_proj longwriter-llama3 transformers>=4.43 - THUDM/LongWriter-llama3.1-8b
chinese-llama-2-1_3b AI-ModelScope/chinese-llama-2-1.3b q_proj, k_proj, v_proj default-generation - hfl/chinese-llama-2-1.3b
chinese-llama-2-7b AI-ModelScope/chinese-llama-2-7b q_proj, k_proj, v_proj default-generation - hfl/chinese-llama-2-7b
chinese-llama-2-7b-16k AI-ModelScope/chinese-llama-2-7b-16k q_proj, k_proj, v_proj default-generation - hfl/chinese-llama-2-7b-16k
chinese-llama-2-7b-64k AI-ModelScope/chinese-llama-2-7b-64k q_proj, k_proj, v_proj default-generation - hfl/chinese-llama-2-7b-64k
chinese-llama-2-13b AI-ModelScope/chinese-llama-2-13b q_proj, k_proj, v_proj default-generation - hfl/chinese-llama-2-13b
chinese-llama-2-13b-16k AI-ModelScope/chinese-llama-2-13b-16k q_proj, k_proj, v_proj default-generation - hfl/chinese-llama-2-13b-16k
chinese-alpaca-2-1_3b AI-ModelScope/chinese-alpaca-2-1.3b q_proj, k_proj, v_proj llama - hfl/chinese-alpaca-2-1.3b
chinese-alpaca-2-7b AI-ModelScope/chinese-alpaca-2-7b q_proj, k_proj, v_proj llama - hfl/chinese-alpaca-2-7b
chinese-alpaca-2-7b-16k AI-ModelScope/chinese-alpaca-2-7b-16k q_proj, k_proj, v_proj llama - hfl/chinese-alpaca-2-7b-16k
chinese-alpaca-2-7b-64k AI-ModelScope/chinese-alpaca-2-7b-64k q_proj, k_proj, v_proj llama - hfl/chinese-alpaca-2-7b-64k
chinese-alpaca-2-13b AI-ModelScope/chinese-alpaca-2-13b q_proj, k_proj, v_proj llama - hfl/chinese-alpaca-2-13b
chinese-alpaca-2-13b-16k AI-ModelScope/chinese-alpaca-2-13b-16k q_proj, k_proj, v_proj llama - hfl/chinese-alpaca-2-13b-16k
llama-3-chinese-8b ChineseAlpacaGroup/llama-3-chinese-8b q_proj, k_proj, v_proj default-generation - hfl/llama-3-chinese-8b
llama-3-chinese-8b-instruct ChineseAlpacaGroup/llama-3-chinese-8b-instruct q_proj, k_proj, v_proj llama3 - hfl/llama-3-chinese-8b-instruct
atom-7b FlagAlpha/Atom-7B q_proj, k_proj, v_proj default-generation - FlagAlpha/Atom-7B
atom-7b-chat FlagAlpha/Atom-7B-Chat q_proj, k_proj, v_proj atom - FlagAlpha/Atom-7B-Chat
yi-6b 01ai/Yi-6B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-6B
yi-6b-200k 01ai/Yi-6B-200K q_proj, k_proj, v_proj default-generation - 01-ai/Yi-6B-200K
yi-6b-chat 01ai/Yi-6B-Chat q_proj, k_proj, v_proj chatml - 01-ai/Yi-6B-Chat
yi-6b-chat-awq 01ai/Yi-6B-Chat-4bits q_proj, k_proj, v_proj chatml autoawq - 01-ai/Yi-6B-Chat-4bits
yi-6b-chat-int8 01ai/Yi-6B-Chat-8bits q_proj, k_proj, v_proj chatml auto_gptq - 01-ai/Yi-6B-Chat-8bits
yi-9b 01ai/Yi-9B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-9B
yi-9b-200k 01ai/Yi-9B-200K q_proj, k_proj, v_proj default-generation - 01-ai/Yi-9B-200K
yi-34b 01ai/Yi-34B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-34B
yi-34b-200k 01ai/Yi-34B-200K q_proj, k_proj, v_proj default-generation - 01-ai/Yi-34B-200K
yi-34b-chat 01ai/Yi-34B-Chat q_proj, k_proj, v_proj chatml - 01-ai/Yi-34B-Chat
yi-34b-chat-awq 01ai/Yi-34B-Chat-4bits q_proj, k_proj, v_proj chatml autoawq - 01-ai/Yi-34B-Chat-4bits
yi-34b-chat-int8 01ai/Yi-34B-Chat-8bits q_proj, k_proj, v_proj chatml auto_gptq - 01-ai/Yi-34B-Chat-8bits
yi-1_5-6b 01ai/Yi-1.5-6B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-1.5-6B
yi-1_5-6b-chat 01ai/Yi-1.5-6B-Chat q_proj, k_proj, v_proj chatml - 01-ai/Yi-1.5-6B-Chat
yi-1_5-9b 01ai/Yi-1.5-9B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-1.5-9B
yi-1_5-9b-chat 01ai/Yi-1.5-9B-Chat q_proj, k_proj, v_proj chatml - 01-ai/Yi-1.5-9B-Chat
yi-1_5-9b-chat-16k 01ai/Yi-1.5-9B-Chat-16K q_proj, k_proj, v_proj chatml - 01-ai/Yi-1.5-9B-Chat-16K
yi-1_5-34b 01ai/Yi-1.5-34B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-1.5-34B
yi-1_5-34b-chat 01ai/Yi-1.5-34B-Chat q_proj, k_proj, v_proj chatml - 01-ai/Yi-1.5-34B-Chat
yi-1_5-34b-chat-16k 01ai/Yi-1.5-34B-Chat-16K q_proj, k_proj, v_proj chatml - 01-ai/Yi-1.5-34B-Chat-16K
yi-1_5-6b-chat-awq-int4 AI-ModelScope/Yi-1.5-6B-Chat-AWQ q_proj, k_proj, v_proj chatml autoawq - modelscope/Yi-1.5-6B-Chat-AWQ
yi-1_5-6b-chat-gptq-int4 AI-ModelScope/Yi-1.5-6B-Chat-GPTQ q_proj, k_proj, v_proj chatml auto_gptq>=0.5 - modelscope/Yi-1.5-6B-Chat-GPTQ
yi-1_5-9b-chat-awq-int4 AI-ModelScope/Yi-1.5-9B-Chat-AWQ q_proj, k_proj, v_proj chatml autoawq - modelscope/Yi-1.5-9B-Chat-AWQ
yi-1_5-9b-chat-gptq-int4 AI-ModelScope/Yi-1.5-9B-Chat-GPTQ q_proj, k_proj, v_proj chatml auto_gptq>=0.5 - modelscope/Yi-1.5-9B-Chat-GPTQ
yi-1_5-34b-chat-awq-int4 AI-ModelScope/Yi-1.5-34B-Chat-AWQ q_proj, k_proj, v_proj chatml autoawq - modelscope/Yi-1.5-34B-Chat-AWQ
yi-1_5-34b-chat-gptq-int4 AI-ModelScope/Yi-1.5-34B-Chat-GPTQ q_proj, k_proj, v_proj chatml auto_gptq>=0.5 - modelscope/Yi-1.5-34B-Chat-GPTQ
yi-coder-1_5b 01ai/Yi-Coder-1.5B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-Coder-1.5B
yi-coder-1_5b-chat 01ai/Yi-Coder-1.5B-Chat q_proj, k_proj, v_proj yi-coder - 01-ai/Yi-Coder-1.5B-Chat
yi-coder-9b 01ai/Yi-Coder-9B q_proj, k_proj, v_proj default-generation - 01-ai/Yi-Coder-9B
yi-coder-9b-chat 01ai/Yi-Coder-9B-Chat q_proj, k_proj, v_proj yi-coder - 01-ai/Yi-Coder-9B-Chat
internlm-7b Shanghai_AI_Laboratory/internlm-7b q_proj, k_proj, v_proj default-generation - internlm/internlm-7b
internlm-7b-chat Shanghai_AI_Laboratory/internlm-chat-7b q_proj, k_proj, v_proj internlm - internlm/internlm-chat-7b
internlm-7b-chat-8k Shanghai_AI_Laboratory/internlm-chat-7b-8k q_proj, k_proj, v_proj internlm - -
internlm-20b Shanghai_AI_Laboratory/internlm-20b q_proj, k_proj, v_proj default-generation - internlm/internlm-20b
internlm-20b-chat Shanghai_AI_Laboratory/internlm-chat-20b q_proj, k_proj, v_proj internlm - internlm/internlm-chat-20b
internlm2-1_8b Shanghai_AI_Laboratory/internlm2-1_8b wqkv default-generation transformers>=4.38 - internlm/internlm2-1_8b
internlm2-1_8b-sft-chat Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft wqkv internlm2 transformers>=4.38 - internlm/internlm2-chat-1_8b-sft
internlm2-1_8b-chat Shanghai_AI_Laboratory/internlm2-chat-1_8b wqkv internlm2 transformers>=4.38 - internlm/internlm2-chat-1_8b
internlm2-7b-base Shanghai_AI_Laboratory/internlm2-base-7b wqkv default-generation transformers>=4.38 - internlm/internlm2-base-7b
internlm2-7b Shanghai_AI_Laboratory/internlm2-7b wqkv default-generation transformers>=4.38 - internlm/internlm2-7b
internlm2-7b-sft-chat Shanghai_AI_Laboratory/internlm2-chat-7b-sft wqkv internlm2 transformers>=4.38 - internlm/internlm2-chat-7b-sft
internlm2-7b-chat Shanghai_AI_Laboratory/internlm2-chat-7b wqkv internlm2 transformers>=4.38 - internlm/internlm2-chat-7b
internlm2-20b-base Shanghai_AI_Laboratory/internlm2-base-20b wqkv default-generation transformers>=4.38 - internlm/internlm2-base-20b
internlm2-20b Shanghai_AI_Laboratory/internlm2-20b wqkv default-generation transformers>=4.38 - internlm/internlm2-20b
internlm2-20b-sft-chat Shanghai_AI_Laboratory/internlm2-chat-20b-sft wqkv internlm2 transformers>=4.38 - internlm/internlm2-chat-20b-sft
internlm2-20b-chat Shanghai_AI_Laboratory/internlm2-chat-20b wqkv internlm2 transformers>=4.38 - internlm/internlm2-chat-20b
internlm2_5-1_8b Shanghai_AI_Laboratory/internlm2_5-1_8b wqkv default-generation transformers>=4.38 - internlm/internlm2_5-1_8b
internlm2_5-1_8b-chat Shanghai_AI_Laboratory/internlm2_5-1_8b-chat wqkv internlm2 transformers>=4.38 - internlm/internlm2_5-1_8b-chat
internlm2_5-7b Shanghai_AI_Laboratory/internlm2_5-7b wqkv default-generation transformers>=4.38 - internlm/internlm2_5-7b
internlm2_5-7b-chat Shanghai_AI_Laboratory/internlm2_5-7b-chat wqkv internlm2 transformers>=4.38 - internlm/internlm2_5-7b-chat
internlm2_5-7b-chat-1m Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m wqkv internlm2 transformers>=4.38 - internlm/internlm2_5-7b-chat-1m
internlm2_5-20b Shanghai_AI_Laboratory/internlm2_5-20b wqkv default-generation transformers>=4.38 - internlm/internlm2_5-20b
internlm2_5-20b-chat Shanghai_AI_Laboratory/internlm2_5-20b-chat wqkv internlm2 transformers>=4.38 - internlm/internlm2_5-20b-chat
internlm2-math-7b Shanghai_AI_Laboratory/internlm2-math-base-7b wqkv default-generation transformers>=4.38 math internlm/internlm2-math-base-7b
internlm2-math-7b-chat Shanghai_AI_Laboratory/internlm2-math-7b wqkv internlm2 transformers>=4.38 math internlm/internlm2-math-7b
internlm2-math-20b Shanghai_AI_Laboratory/internlm2-math-base-20b wqkv default-generation transformers>=4.38 math internlm/internlm2-math-base-20b
internlm2-math-20b-chat Shanghai_AI_Laboratory/internlm2-math-20b wqkv internlm2 transformers>=4.38 math internlm/internlm2-math-20b
deepseek-7b deepseek-ai/deepseek-llm-7b-base q_proj, k_proj, v_proj default-generation - deepseek-ai/deepseek-llm-7b-base
deepseek-7b-chat deepseek-ai/deepseek-llm-7b-chat q_proj, k_proj, v_proj deepseek - deepseek-ai/deepseek-llm-7b-chat
deepseek-moe-16b deepseek-ai/deepseek-moe-16b-base q_proj, k_proj, v_proj default-generation moe deepseek-ai/deepseek-moe-16b-base
deepseek-moe-16b-chat deepseek-ai/deepseek-moe-16b-chat q_proj, k_proj, v_proj deepseek moe deepseek-ai/deepseek-moe-16b-chat
deepseek-67b deepseek-ai/deepseek-llm-67b-base q_proj, k_proj, v_proj default-generation - deepseek-ai/deepseek-llm-67b-base
deepseek-67b-chat deepseek-ai/deepseek-llm-67b-chat q_proj, k_proj, v_proj deepseek - deepseek-ai/deepseek-llm-67b-chat
deepseek-coder-1_3b deepseek-ai/deepseek-coder-1.3b-base q_proj, k_proj, v_proj default-generation coding deepseek-ai/deepseek-coder-1.3b-base
deepseek-coder-1_3b-instruct deepseek-ai/deepseek-coder-1.3b-instruct q_proj, k_proj, v_proj deepseek-coder coding deepseek-ai/deepseek-coder-1.3b-instruct
deepseek-coder-6_7b deepseek-ai/deepseek-coder-6.7b-base q_proj, k_proj, v_proj default-generation coding deepseek-ai/deepseek-coder-6.7b-base
deepseek-coder-6_7b-instruct deepseek-ai/deepseek-coder-6.7b-instruct q_proj, k_proj, v_proj deepseek-coder coding deepseek-ai/deepseek-coder-6.7b-instruct
deepseek-coder-33b deepseek-ai/deepseek-coder-33b-base q_proj, k_proj, v_proj default-generation coding deepseek-ai/deepseek-coder-33b-base
deepseek-coder-33b-instruct deepseek-ai/deepseek-coder-33b-instruct q_proj, k_proj, v_proj deepseek-coder coding deepseek-ai/deepseek-coder-33b-instruct
deepseek-coder-v2-instruct deepseek-ai/DeepSeek-Coder-V2-Instruct q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj deepseek2 transformers>=4.39.3 coding, moe deepseek-ai/DeepSeek-Coder-V2-Instruct
deepseek-coder-v2-lite-instruct deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj deepseek2 transformers>=4.39.3 coding, moe deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
deepseek-coder-v2 deepseek-ai/DeepSeek-Coder-V2-Base q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj default-generation transformers>=4.39.3 coding, moe deepseek-ai/DeepSeek-Coder-V2-Base
deepseek-coder-v2-lite deepseek-ai/DeepSeek-Coder-V2-Lite-Base q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj default-generation transformers>=4.39.3 coding, moe deepseek-ai/DeepSeek-Coder-V2-Lite-Base
deepseek-math-7b deepseek-ai/deepseek-math-7b-base q_proj, k_proj, v_proj default-generation math deepseek-ai/deepseek-math-7b-base
deepseek-math-7b-instruct deepseek-ai/deepseek-math-7b-instruct q_proj, k_proj, v_proj deepseek math deepseek-ai/deepseek-math-7b-instruct
deepseek-math-7b-chat deepseek-ai/deepseek-math-7b-rl q_proj, k_proj, v_proj deepseek math deepseek-ai/deepseek-math-7b-rl
numina-math-7b AI-ModelScope/NuminaMath-7B-TIR q_proj, k_proj, v_proj numina-math math AI-MO/NuminaMath-7B-TIR
deepseek-v2 deepseek-ai/DeepSeek-V2 q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj default-generation transformers>=4.39.3 moe deepseek-ai/DeepSeek-V2
deepseek-v2-chat deepseek-ai/DeepSeek-V2-Chat q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj deepseek2 transformers>=4.39.3 moe deepseek-ai/DeepSeek-V2-Chat
deepseek-v2-lite deepseek-ai/DeepSeek-V2-Lite q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj default-generation transformers>=4.39.3 moe deepseek-ai/DeepSeek-V2-Lite
deepseek-v2-lite-chat deepseek-ai/DeepSeek-V2-Lite-Chat q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj deepseek2 transformers>=4.39.3 moe deepseek-ai/DeepSeek-V2-Lite-Chat
deepseek-v2_5 deepseek-ai/DeepSeek-V2.5 q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj deepseek2_5 transformers>=4.39.3 moe deepseek-ai/DeepSeek-V2.5
gemma-2b AI-ModelScope/gemma-2b q_proj, k_proj, v_proj default-generation transformers>=4.38 - google/gemma-2b
gemma-7b AI-ModelScope/gemma-7b q_proj, k_proj, v_proj default-generation transformers>=4.38 - google/gemma-7b
gemma-2b-instruct AI-ModelScope/gemma-2b-it q_proj, k_proj, v_proj gemma transformers>=4.38 - google/gemma-2b-it
gemma-7b-instruct AI-ModelScope/gemma-7b-it q_proj, k_proj, v_proj gemma transformers>=4.38 - google/gemma-7b-it
gemma2-2b LLM-Research/gemma-2-2b q_proj, k_proj, v_proj default-generation transformers>=4.42 - google/gemma-2-2b
gemma2-9b LLM-Research/gemma-2-9b q_proj, k_proj, v_proj default-generation transformers>=4.42 - google/gemma-2-9b
gemma2-27b LLM-Research/gemma-2-27b q_proj, k_proj, v_proj default-generation transformers>=4.42 - google/gemma-2-27b
gemma2-2b-instruct LLM-Research/gemma-2-2b-it q_proj, k_proj, v_proj gemma transformers>=4.42 - google/gemma-2-2b-it
gemma2-9b-instruct LLM-Research/gemma-2-9b-it q_proj, k_proj, v_proj gemma transformers>=4.42 - google/gemma-2-9b-it
gemma2-27b-instruct LLM-Research/gemma-2-27b-it q_proj, k_proj, v_proj gemma transformers>=4.42 - google/gemma-2-27b-it
minicpm-1b-sft-chat OpenBMB/MiniCPM-1B-sft-bf16 q_proj, k_proj, v_proj minicpm transformers>=4.36.0 - openbmb/MiniCPM-1B-sft-bf16
minicpm-2b-sft-chat OpenBMB/MiniCPM-2B-sft-fp32 q_proj, k_proj, v_proj minicpm - openbmb/MiniCPM-2B-sft-fp32
minicpm-2b-chat OpenBMB/MiniCPM-2B-dpo-fp32 q_proj, k_proj, v_proj minicpm - openbmb/MiniCPM-2B-dpo-fp32
minicpm-2b-128k OpenBMB/MiniCPM-2B-128k q_proj, k_proj, v_proj chatml transformers>=4.36.0 - openbmb/MiniCPM-2B-128k
minicpm-moe-8x2b OpenBMB/MiniCPM-MoE-8x2B q_proj, k_proj, v_proj minicpm transformers>=4.36.0 moe openbmb/MiniCPM-MoE-8x2B
minicpm3-4b OpenBMB/MiniCPM3-4B q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj chatml transformers>=4.36 - openbmb/MiniCPM3-4B
openbuddy-llama-65b-chat OpenBuddy/openbuddy-llama-65b-v8-bf16 q_proj, k_proj, v_proj openbuddy - OpenBuddy/openbuddy-llama-65b-v8-bf16
openbuddy-llama2-13b-chat OpenBuddy/openbuddy-llama2-13b-v8.1-fp16 q_proj, k_proj, v_proj openbuddy - OpenBuddy/openbuddy-llama2-13b-v8.1-fp16
openbuddy-llama2-70b-chat OpenBuddy/openbuddy-llama2-70b-v10.1-bf16 q_proj, k_proj, v_proj openbuddy - OpenBuddy/openbuddy-llama2-70b-v10.1-bf16
openbuddy-llama3-8b-chat OpenBuddy/openbuddy-llama3-8b-v21.1-8k q_proj, k_proj, v_proj openbuddy2 - OpenBuddy/openbuddy-llama3-8b-v21.1-8k
openbuddy-llama3-70b-chat OpenBuddy/openbuddy-llama3-70b-v21.1-8k q_proj, k_proj, v_proj openbuddy2 - OpenBuddy/openbuddy-llama3-70b-v21.1-8k
openbuddy-mistral-7b-chat OpenBuddy/openbuddy-mistral-7b-v17.1-32k q_proj, k_proj, v_proj openbuddy transformers>=4.34 - OpenBuddy/openbuddy-mistral-7b-v17.1-32k
openbuddy-zephyr-7b-chat OpenBuddy/openbuddy-zephyr-7b-v14.1 q_proj, k_proj, v_proj openbuddy transformers>=4.34 - OpenBuddy/openbuddy-zephyr-7b-v14.1
openbuddy-deepseek-67b-chat OpenBuddy/openbuddy-deepseek-67b-v15.2 q_proj, k_proj, v_proj openbuddy - OpenBuddy/openbuddy-deepseek-67b-v15.2
openbuddy-mixtral-moe-7b-chat OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k q_proj, k_proj, v_proj openbuddy transformers>=4.36 moe OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k
openbuddy-llama3_1-8b-chat OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k q_proj, k_proj, v_proj openbuddy2 transformers>=4.43 - OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k
mistral-7b AI-ModelScope/Mistral-7B-v0.1 q_proj, k_proj, v_proj default-generation transformers>=4.34 - mistralai/Mistral-7B-v0.1
mistral-7b-v2 AI-ModelScope/Mistral-7B-v0.2-hf q_proj, k_proj, v_proj default-generation transformers>=4.34 - alpindale/Mistral-7B-v0.2-hf
mistral-7b-instruct AI-ModelScope/Mistral-7B-Instruct-v0.1 q_proj, k_proj, v_proj llama transformers>=4.34 - mistralai/Mistral-7B-Instruct-v0.1
mistral-7b-instruct-v2 AI-ModelScope/Mistral-7B-Instruct-v0.2 q_proj, k_proj, v_proj llama transformers>=4.34 - mistralai/Mistral-7B-Instruct-v0.2
mistral-7b-instruct-v3 LLM-Research/Mistral-7B-Instruct-v0.3 q_proj, k_proj, v_proj llama transformers>=4.34 - mistralai/Mistral-7B-Instruct-v0.3
mistral-nemo-base-2407 AI-ModelScope/Mistral-Nemo-Base-2407 q_proj, k_proj, v_proj default-generation transformers>=4.43 - mistralai/Mistral-Nemo-Base-2407
mistral-nemo-instruct-2407 AI-ModelScope/Mistral-Nemo-Instruct-2407 q_proj, k_proj, v_proj mistral-nemo transformers>=4.43 - mistralai/Mistral-Nemo-Instruct-2407
mistral-large-instruct-2407 LLM-Research/Mistral-Large-Instruct-2407 q_proj, k_proj, v_proj mistral-nemo transformers>=4.43 - mistralai/Mistral-Large-Instruct-2407
mistral-small-instruct-2409 AI-ModelScope/Mistral-Small-Instruct-2409 q_proj, k_proj, v_proj mistral-nemo transformers>=4.43 - mistralai/Mistral-Small-Instruct-2409
mixtral-moe-7b AI-ModelScope/Mixtral-8x7B-v0.1 q_proj, k_proj, v_proj default-generation transformers>=4.36 moe mistralai/Mixtral-8x7B-v0.1
mixtral-moe-7b-instruct AI-ModelScope/Mixtral-8x7B-Instruct-v0.1 q_proj, k_proj, v_proj llama transformers>=4.36 moe mistralai/Mixtral-8x7B-Instruct-v0.1
mixtral-moe-7b-aqlm-2bit-1x16 AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf q_proj, k_proj, v_proj default-generation transformers>=4.38, aqlm, torch>=2.2.0 moe ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf
mixtral-moe-8x22b-v1 AI-ModelScope/Mixtral-8x22B-v0.1 q_proj, k_proj, v_proj default-generation transformers>=4.36 moe mistral-community/Mixtral-8x22B-v0.1
ministral-8b-instruct-2410 AI-ModelScope/Ministral-8B-Instruct-2410 q_proj, k_proj, v_proj mistral-nemo transformers>=4.46 - mistralai/Ministral-8B-Instruct-2410
wizardlm2-7b-awq AI-ModelScope/WizardLM-2-7B-AWQ q_proj, k_proj, v_proj wizardlm2-awq transformers>=4.34 - MaziyarPanahi/WizardLM-2-7B-AWQ
wizardlm2-8x22b AI-ModelScope/WizardLM-2-8x22B q_proj, k_proj, v_proj wizardlm2 transformers>=4.36 - alpindale/WizardLM-2-8x22B
baichuan-7b baichuan-inc/baichuan-7B W_pack default-generation transformers<4.34 - baichuan-inc/Baichuan-7B
baichuan-13b baichuan-inc/Baichuan-13B-Base W_pack default-generation transformers<4.34 - baichuan-inc/Baichuan-13B-Base
baichuan-13b-chat baichuan-inc/Baichuan-13B-Chat W_pack baichuan transformers<4.34 - baichuan-inc/Baichuan-13B-Chat
baichuan2-7b baichuan-inc/Baichuan2-7B-Base W_pack default-generation - baichuan-inc/Baichuan2-7B-Base
baichuan2-7b-chat baichuan-inc/Baichuan2-7B-Chat W_pack baichuan - baichuan-inc/Baichuan2-7B-Chat
baichuan2-7b-chat-int4 baichuan-inc/Baichuan2-7B-Chat-4bits W_pack baichuan bitsandbytes<0.41.2, accelerate<0.26 - baichuan-inc/Baichuan2-7B-Chat-4bits
baichuan2-13b baichuan-inc/Baichuan2-13B-Base W_pack default-generation - baichuan-inc/Baichuan2-13B-Base
baichuan2-13b-chat baichuan-inc/Baichuan2-13B-Chat W_pack baichuan - baichuan-inc/Baichuan2-13B-Chat
baichuan2-13b-chat-int4 baichuan-inc/Baichuan2-13B-Chat-4bits W_pack baichuan bitsandbytes<0.41.2, accelerate<0.26 - baichuan-inc/Baichuan2-13B-Chat-4bits
yuan2-2b-instruct YuanLLM/Yuan2.0-2B-hf q_proj, k_proj, v_proj yuan - IEITYuan/Yuan2-2B-hf
yuan2-2b-janus-instruct YuanLLM/Yuan2-2B-Janus-hf q_proj, k_proj, v_proj yuan - IEITYuan/Yuan2-2B-Janus-hf
yuan2-51b-instruct YuanLLM/Yuan2.0-51B-hf q_proj, k_proj, v_proj yuan - IEITYuan/Yuan2-51B-hf
yuan2-102b-instruct YuanLLM/Yuan2.0-102B-hf q_proj, k_proj, v_proj yuan - IEITYuan/Yuan2-102B-hf
yuan2-m32 YuanLLM/Yuan2-M32-hf q_proj, k_proj, v_proj yuan moe IEITYuan/Yuan2-M32-hf
xverse-7b xverse/XVERSE-7B q_proj, k_proj, v_proj default-generation - xverse/XVERSE-7B
xverse-7b-chat xverse/XVERSE-7B-Chat q_proj, k_proj, v_proj xverse - xverse/XVERSE-7B-Chat
xverse-13b xverse/XVERSE-13B q_proj, k_proj, v_proj default-generation - xverse/XVERSE-13B
xverse-13b-chat xverse/XVERSE-13B-Chat q_proj, k_proj, v_proj xverse - xverse/XVERSE-13B-Chat
xverse-65b xverse/XVERSE-65B q_proj, k_proj, v_proj default-generation - xverse/XVERSE-65B
xverse-65b-v2 xverse/XVERSE-65B-2 q_proj, k_proj, v_proj default-generation - xverse/XVERSE-65B-2
xverse-65b-chat xverse/XVERSE-65B-Chat q_proj, k_proj, v_proj xverse - xverse/XVERSE-65B-Chat
xverse-13b-256k xverse/XVERSE-13B-256K q_proj, k_proj, v_proj default-generation - xverse/XVERSE-13B-256K
xverse-moe-a4_2b xverse/XVERSE-MoE-A4.2B q_proj, k_proj, v_proj default-generation moe xverse/XVERSE-MoE-A4.2B
orion-14b OrionStarAI/Orion-14B-Base q_proj, k_proj, v_proj default-generation - OrionStarAI/Orion-14B-Base
orion-14b-chat OrionStarAI/Orion-14B-Chat q_proj, k_proj, v_proj orion - OrionStarAI/Orion-14B-Chat
bluelm-7b vivo-ai/BlueLM-7B-Base q_proj, k_proj, v_proj default-generation - vivo-ai/BlueLM-7B-Base
bluelm-7b-32k vivo-ai/BlueLM-7B-Base-32K q_proj, k_proj, v_proj default-generation - vivo-ai/BlueLM-7B-Base-32K
bluelm-7b-chat vivo-ai/BlueLM-7B-Chat q_proj, k_proj, v_proj bluelm - vivo-ai/BlueLM-7B-Chat
bluelm-7b-chat-32k vivo-ai/BlueLM-7B-Chat-32K q_proj, k_proj, v_proj bluelm - vivo-ai/BlueLM-7B-Chat-32K
ziya2-13b Fengshenbang/Ziya2-13B-Base q_proj, k_proj, v_proj default-generation - IDEA-CCNL/Ziya2-13B-Base
ziya2-13b-chat Fengshenbang/Ziya2-13B-Chat q_proj, k_proj, v_proj ziya - IDEA-CCNL/Ziya2-13B-Chat
skywork-13b skywork/Skywork-13B-base q_proj, k_proj, v_proj default-generation - Skywork/Skywork-13B-base
skywork-13b-chat skywork/Skywork-13B-chat q_proj, k_proj, v_proj skywork - -
zephyr-7b-beta-chat modelscope/zephyr-7b-beta q_proj, k_proj, v_proj zephyr transformers>=4.34 - HuggingFaceH4/zephyr-7b-beta
polylm-13b damo/nlp_polylm_13b_text_generation c_attn default-generation - DAMO-NLP-MT/polylm-13b
seqgpt-560m damo/nlp_seqgpt-560m query_key_value default-generation - DAMO-NLP/SeqGPT-560M
sus-34b-chat SUSTC/SUS-Chat-34B q_proj, k_proj, v_proj sus - SUSTech/SUS-Chat-34B
tongyi-finance-14b TongyiFinance/Tongyi-Finance-14B c_attn default-generation financial -
tongyi-finance-14b-chat TongyiFinance/Tongyi-Finance-14B-Chat c_attn qwen financial jxy/Tongyi-Finance-14B-Chat
tongyi-finance-14b-chat-int4 TongyiFinance/Tongyi-Finance-14B-Chat-Int4 c_attn qwen auto_gptq>=0.5 financial jxy/Tongyi-Finance-14B-Chat-Int4
codefuse-codellama-34b-chat codefuse-ai/CodeFuse-CodeLlama-34B q_proj, k_proj, v_proj codefuse-codellama coding codefuse-ai/CodeFuse-CodeLlama-34B
codefuse-codegeex2-6b-chat codefuse-ai/CodeFuse-CodeGeeX2-6B query_key_value codefuse transformers<4.34 coding codefuse-ai/CodeFuse-CodeGeeX2-6B
codefuse-qwen-14b-chat codefuse-ai/CodeFuse-QWen-14B c_attn codefuse coding codefuse-ai/CodeFuse-QWen-14B
phi2-3b AI-ModelScope/phi-2 Wqkv default-generation coding microsoft/phi-2
phi3-4b-4k-instruct LLM-Research/Phi-3-mini-4k-instruct qkv_proj phi3 transformers>=4.36 - microsoft/Phi-3-mini-4k-instruct
phi3-4b-128k-instruct LLM-Research/Phi-3-mini-128k-instruct qkv_proj phi3 transformers>=4.36 - microsoft/Phi-3-mini-128k-instruct
phi3-small-8k-instruct LLM-Research/Phi-3-small-8k-instruct query_key_value phi3 transformers>=4.36 - microsoft/Phi-3-small-8k-instruct
phi3-medium-4k-instruct LLM-Research/Phi-3-medium-4k-instruct qkv_proj phi3 transformers>=4.36 - microsoft/Phi-3-medium-4k-instruct
phi3-small-128k-instruct LLM-Research/Phi-3-small-128k-instruct query_key_value phi3 transformers>=4.36 - microsoft/Phi-3-small-128k-instruct
phi3-medium-128k-instruct LLM-Research/Phi-3-medium-128k-instruct qkv_proj phi3 transformers>=4.36 - microsoft/Phi-3-medium-128k-instruct
phi3_5-mini-instruct LLM-Research/Phi-3.5-mini-instruct qkv_proj phi3 transformers>=4.36 - microsoft/Phi-3.5-mini-instruct
phi3_5-moe-instruct LLM-Research/Phi-3.5-MoE-instruct q_proj, k_proj, v_proj phi3 transformers>=4.36 moe microsoft/Phi-3.5-MoE-instruct
mamba-130m AI-ModelScope/mamba-130m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-130m-hf
mamba-370m AI-ModelScope/mamba-370m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-370m-hf
mamba-390m AI-ModelScope/mamba-390m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-390m-hf
mamba-790m AI-ModelScope/mamba-790m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-790m-hf
mamba-1.4b AI-ModelScope/mamba-1.4b-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-1.4b-hf
mamba-2.8b AI-ModelScope/mamba-2.8b-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 - state-spaces/mamba-2.8b-hf
telechat-7b TeleAI/TeleChat-7B key_value, query telechat - Tele-AI/telechat-7B
telechat-12b TeleAI/TeleChat-12B key_value, query telechat - Tele-AI/TeleChat-12B
telechat-12b-v2 TeleAI/TeleChat-12B-v2 key_value, query telechat - Tele-AI/TeleChat-12B-v2
telechat-12b-v2-gptq-int4 swift/TeleChat-12B-V2-GPTQ-Int4 key_value, query telechat auto_gptq>=0.5 - -
telechat2-115b TeleAI/TeleChat2-115B key_value, query telechat2 - Tele-AI/TeleChat2-115B
grok-1 colossalai/grok-1-pytorch q_proj, k_proj, v_proj default-generation - hpcai-tech/grok-1
dbrx-instruct AI-ModelScope/dbrx-instruct attn.Wqkv dbrx transformers>=4.36 moe databricks/dbrx-instruct
dbrx-base AI-ModelScope/dbrx-base attn.Wqkv dbrx transformers>=4.36 moe databricks/dbrx-base
mengzi3-13b-base langboat/Mengzi3-13B-Base q_proj, k_proj, v_proj mengzi - Langboat/Mengzi3-13B-Base
c4ai-command-r-v01 AI-ModelScope/c4ai-command-r-v01 q_proj, k_proj, v_proj c4ai transformers>=4.39.1 - CohereForAI/c4ai-command-r-v01
c4ai-command-r-plus AI-ModelScope/c4ai-command-r-plus q_proj, k_proj, v_proj c4ai transformers>4.39 - CohereForAI/c4ai-command-r-plus
aya-expanse-8b AI-ModelScope/aya-expanse-8b q_proj, k_proj, v_proj aya transformers>=4.44.0 - CohereForAI/aya-expanse-8b
aya-expanse-32b AI-ModelScope/aya-expanse-32b q_proj, k_proj, v_proj aya transformers>=4.44.0 - CohereForAI/aya-expanse-32b
codestral-22b swift/Codestral-22B-v0.1 q_proj, k_proj, v_proj default-generation transformers>=4.34 - mistralai/Codestral-22B-v0.1

多模态大模型

Model Type Model ID Default Lora Target Modules Default Template Support Flash Attn Support vLLM Support LMDeploy Support Megatron Requires Tags HF Model ID
qwen-vl qwen/Qwen-VL ^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* qwen-vl-generation vision Qwen/Qwen-VL
qwen-vl-chat qwen/Qwen-VL-Chat ^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* qwen-vl vision Qwen/Qwen-VL-Chat
qwen-vl-chat-int4 qwen/Qwen-VL-Chat-Int4 ^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* qwen-vl auto_gptq>=0.5 vision Qwen/Qwen-VL-Chat-Int4
qwen-audio qwen/Qwen-Audio ^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* qwen-audio-generation audio Qwen/Qwen-Audio
qwen-audio-chat qwen/Qwen-Audio-Chat ^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* qwen-audio audio Qwen/Qwen-Audio-Chat
qwen2-audio-7b qwen/Qwen2-Audio-7B ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-audio-generation librosa, transformers>=4.45 audio Qwen/Qwen2-Audio-7B
qwen2-audio-7b-instruct qwen/Qwen2-Audio-7B-Instruct ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-audio librosa, transformers>=4.45 audio Qwen/Qwen2-Audio-7B-Instruct
qwen2-vl-2b qwen/Qwen2-VL-2B ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl-generation transformers>=4.45.dev.0, qwen_vl_utils vision, video Qwen/Qwen2-VL-2B
qwen2-vl-2b-instruct qwen/Qwen2-VL-2B-Instruct ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils vision, video Qwen/Qwen2-VL-2B-Instruct
qwen2-vl-2b-instruct-gptq-int4 qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4 ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 vision, video Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4
qwen2-vl-2b-instruct-gptq-int8 qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8 ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 vision, video Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8
qwen2-vl-2b-instruct-awq qwen/Qwen2-VL-2B-Instruct-AWQ ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils, autoawq vision, video Qwen/Qwen2-VL-2B-Instruct-AWQ
qwen2-vl-7b qwen/Qwen2-VL-7B ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl-generation transformers>=4.45.dev.0, qwen_vl_utils vision, video Qwen/Qwen2-VL-7B
qwen2-vl-7b-instruct qwen/Qwen2-VL-7B-Instruct ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils vision, video Qwen/Qwen2-VL-7B-Instruct
qwen2-vl-7b-instruct-gptq-int4 qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4 ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 vision, video Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4
qwen2-vl-7b-instruct-gptq-int8 qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8 ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 vision, video Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8
qwen2-vl-7b-instruct-awq qwen/Qwen2-VL-7B-Instruct-AWQ ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils, autoawq vision, video Qwen/Qwen2-VL-7B-Instruct-AWQ
qwen2-vl-72b qwen/Qwen2-VL-72B ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl-generation transformers>=4.45.dev.0, qwen_vl_utils vision, video Qwen/Qwen2-VL-72B
qwen2-vl-72b-instruct qwen/Qwen2-VL-72B-Instruct ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils vision, video Qwen/Qwen2-VL-72B-Instruct
qwen2-vl-72b-instruct-gptq-int4 qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4 ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 vision, video Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4
qwen2-vl-72b-instruct-gptq-int8 qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8 ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 vision, video Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8
qwen2-vl-72b-instruct-awq qwen/Qwen2-VL-72B-Instruct-AWQ ^(model)(?!.*(lm_head|output|emb|wte|shared)).* qwen2-vl transformers>=4.45.dev.0, qwen_vl_utils, autoawq vision, video Qwen/Qwen2-VL-72B-Instruct-AWQ
glm4v-9b-chat ZhipuAI/glm-4v-9b ^(transformer.encoder)(?!.*(lm_head|output|emb|wte|shared)).* glm4v transformers>=4.42 vision THUDM/glm-4v-9b
llama3_2-11b-vision LLM-Research/Llama-3.2-11B-Vision ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llama3_2-vision-generation transformers>=4.45 vision meta-llama/Llama-3.2-11B-Vision
llama3_2-11b-vision-instruct LLM-Research/Llama-3.2-11B-Vision-Instruct ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llama3_2-vision transformers>=4.45 vision meta-llama/Llama-3.2-11B-Vision-Instruct
llama3_2-90b-vision LLM-Research/Llama-3.2-90B-Vision ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llama3_2-vision-generation transformers>=4.45 vision meta-llama/Llama-3.2-90B-Vision
llama3_2-90b-vision-instruct LLM-Research/Llama-3.2-90B-Vision-Instruct ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llama3_2-vision transformers>=4.45 vision meta-llama/Llama-3.2-90B-Vision-Instruct
llama3_1-8b-omni ICTNLP/Llama-3.1-8B-Omni ^(model.layers|model.speech_projector)(?!.*(lm_head|output|emb|wte|shared)).* llama3_1-omni whisper, openai-whisper audio ICTNLP/Llama-3.1-8B-Omni
idefics3-8b-llama3 AI-ModelScope/Idefics3-8B-Llama3 ^(model.text_model|model.connector)(?!.*(lm_head|output|emb|wte|shared)).* idefics3 transformers>=4.45 vision HuggingFaceM4/Idefics3-8B-Llama3
llava1_5-7b-instruct swift/llava-1.5-7b-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava1_5 transformers>=4.36 vision llava-hf/llava-1.5-7b-hf
llava1_5-13b-instruct swift/llava-1.5-13b-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava1_5 transformers>=4.36 vision llava-hf/llava-1.5-13b-hf
llava1_6-mistral-7b-instruct swift/llava-v1.6-mistral-7b-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-mistral transformers>=4.39 vision llava-hf/llava-v1.6-mistral-7b-hf
llava1_6-vicuna-7b-instruct swift/llava-v1.6-vicuna-7b-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-vicuna transformers>=4.39 vision llava-hf/llava-v1.6-vicuna-7b-hf
llava1_6-vicuna-13b-instruct swift/llava-v1.6-vicuna-13b-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-vicuna transformers>=4.39 vision llava-hf/llava-v1.6-vicuna-13b-hf
llava1_6-llama3_1-8b-instruct DaozeZhang/llava-llama3.1-8b ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-next-llama3 transformers>=4.41 vision -
llava1_6-yi-34b-instruct swift/llava-v1.6-34b-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-yi transformers>=4.39 vision llava-hf/llava-v1.6-34b-hf
llama3-llava-next-8b-hf swift/llama3-llava-next-8b-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llama-llava-next-hf transformers>=4.39 vision llava-hf/llama3-llava-next-8b-hf
llava-next-72b-hf AI-ModelScope/llava-next-72b-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llama-qwen-hf transformers>=4.39 vision llava-hf/llava-next-72b-hf
llava-next-110b-hf AI-ModelScope/llava-next-110b-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llama-qwen-hf transformers>=4.39 vision llava-hf/llava-next-110b-hf
llava-onevision-qwen2-0_5b-ov AI-ModelScope/llava-onevision-qwen2-0.5b-ov-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-onevision-qwen transformers>=4.45 vision, video llava-hf/llava-onevision-qwen2-0.5b-ov-hf
llava-onevision-qwen2-7b-ov AI-ModelScope/llava-onevision-qwen2-7b-ov-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-onevision-qwen transformers>=4.45 vision, video llava-hf/llava-onevision-qwen2-7b-ov-hf
llava-onevision-qwen2-72b-ov AI-ModelScope/llava-onevision-qwen2-72b-ov-hf ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-onevision-qwen transformers>=4.45 vision, video llava-hf/llava-onevision-qwen2-72b-ov-hf
llama3-llava-next-8b AI-Modelscope/llama3-llava-next-8b ^(model.layers|model.mm_projector)(?!.*(lm_head|output|emb|wte|shared)).* llama3-llava-next vision lmms-lab/llama3-llava-next-8b
llava-next-72b AI-Modelscope/llava-next-72b ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-qwen vision lmms-lab/llava-next-72b
llava-next-110b AI-Modelscope/llava-next-110b ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-qwen vision lmms-lab/llava-next-110b
llava-next-video-7b-instruct swift/LLaVA-NeXT-Video-7B-hf ^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).* llava-next-video transformers>=4.42, av video llava-hf/LLaVA-NeXT-Video-7B-hf
llava-next-video-7b-32k-instruct swift/LLaVA-NeXT-Video-7B-32K-hf ^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).* llava-next-video transformers>=4.42, av video llava-hf/LLaVA-NeXT-Video-7B-32K-hf
llava-next-video-7b-dpo-instruct swift/LLaVA-NeXT-Video-7B-DPO-hf ^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).* llava-next-video transformers>=4.42, av video llava-hf/LLaVA-NeXT-Video-7B-DPO-hf
llava-next-video-34b-instruct swift/LLaVA-NeXT-Video-34B-hf ^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).* llava-next-video-yi transformers>=4.42, av video llava-hf/LLaVA-NeXT-Video-34B-hf
yi-vl-6b-chat 01ai/Yi-VL-6B ^(model.layers|model.mm_projector)(?!.*(lm_head|output|emb|wte|shared)).* yi-vl transformers>=4.34 vision 01-ai/Yi-VL-6B
yi-vl-34b-chat 01ai/Yi-VL-34B ^(model.layers|model.mm_projector)(?!.*(lm_head|output|emb|wte|shared)).* yi-vl transformers>=4.34 vision 01-ai/Yi-VL-34B
llava-llama3-8b-v1_1 AI-ModelScope/llava-llama-3-8b-v1_1-transformers ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* llava-llama-instruct transformers>=4.36 vision xtuner/llava-llama-3-8b-v1_1-transformers
internlm-xcomposer2-7b-chat Shanghai_AI_Laboratory/internlm-xcomposer2-7b attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3 internlm-xcomposer2 vision internlm/internlm-xcomposer2-7b
internlm-xcomposer2-4khd-7b-chat Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3 internlm-xcomposer2-4khd vision internlm/internlm-xcomposer2-4khd-7b
internlm-xcomposer2_5-7b-chat Shanghai_AI_Laboratory/internlm-xcomposer2d5-7b attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3 internlm-xcomposer2_5 vision internlm/internlm-xcomposer2d5-7b
internvl-chat-v1_5 AI-ModelScope/InternVL-Chat-V1-5 ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl transformers>=4.35, timm vision OpenGVLab/InternVL-Chat-V1-5
internvl-chat-v1_5-int8 AI-ModelScope/InternVL-Chat-V1-5-int8 ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl transformers>=4.35, timm vision OpenGVLab/InternVL-Chat-V1-5-int8
mini-internvl-chat-2b-v1_5 OpenGVLab/Mini-InternVL-Chat-2B-V1-5 ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl transformers>=4.35, timm vision OpenGVLab/Mini-InternVL-Chat-2B-V1-5
mini-internvl-chat-4b-v1_5 OpenGVLab/Mini-InternVL-Chat-4B-V1-5 ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl-phi3 transformers>=4.35,<4.42, timm vision OpenGVLab/Mini-InternVL-Chat-4B-V1-5
internvl2-1b OpenGVLab/InternVL2-1B ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-1B
internvl2-2b OpenGVLab/InternVL2-2B ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-2B
internvl2-4b OpenGVLab/InternVL2-4B ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2-phi3 transformers>=4.36,<4.42, timm vision, video OpenGVLab/InternVL2-4B
internvl2-8b OpenGVLab/InternVL2-8B ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-8B
internvl2-26b OpenGVLab/InternVL2-26B ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-26B
internvl2-40b OpenGVLab/InternVL2-40B ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-40B
internvl2-llama3-76b OpenGVLab/InternVL2-Llama3-76B ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-Llama3-76B
internvl2-2b-awq OpenGVLab/InternVL2-2B-AWQ ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-2B-AWQ
internvl2-8b-awq OpenGVLab/InternVL2-8B-AWQ ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-8B-AWQ
internvl2-26b-awq OpenGVLab/InternVL2-26B-AWQ ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-26B-AWQ
internvl2-40b-awq OpenGVLab/InternVL2-40B-AWQ ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-40B-AWQ
internvl2-llama3-76b-awq OpenGVLab/InternVL2-Llama3-76B-AWQ ^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* internvl2 transformers>=4.36, timm vision, video OpenGVLab/InternVL2-Llama3-76B-AWQ
deepseek-janus-1_3b deepseek-ai/Janus-1.3B ^(language_model|aligner)(?!.*(lm_head|output|emb|wte|shared)).* deepseek-janus vision deepseek-ai/Janus-1.3B
deepseek-vl-1_3b-chat deepseek-ai/deepseek-vl-1.3b-chat ^(language_model|aligner)(?!.*(lm_head|output|emb|wte|shared)).* deepseek-vl vision deepseek-ai/deepseek-vl-1.3b-chat
deepseek-vl-7b-chat deepseek-ai/deepseek-vl-7b-chat ^(language_model|aligner)(?!.*(lm_head|output|emb|wte|shared)).* deepseek-vl vision deepseek-ai/deepseek-vl-7b-chat
ovis1_6-gemma2-9b AIDC-AI/Ovis1.6-Gemma2-9B ^(llm)(?!.*(lm_head|output|emb|wte|shared)).* ovis1_6 transformers>=4.42 vision AIDC-AI/Ovis1.6-Gemma2-9B
paligemma-3b-pt-224 AI-ModelScope/paligemma-3b-pt-224 ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* paligemma transformers>=4.41 vision google/paligemma-3b-pt-224
paligemma-3b-pt-448 AI-ModelScope/paligemma-3b-pt-448 ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* paligemma transformers>=4.41 vision google/paligemma-3b-pt-448
paligemma-3b-pt-896 AI-ModelScope/paligemma-3b-pt-896 ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* paligemma transformers>=4.41 vision google/paligemma-3b-pt-896
paligemma-3b-mix-224 AI-ModelScope/paligemma-3b-mix-224 ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* paligemma transformers>=4.41 vision google/paligemma-3b-mix-224
paligemma-3b-mix-448 AI-ModelScope/paligemma-3b-mix-448 ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* paligemma transformers>=4.41 vision google/paligemma-3b-mix-448
minicpm-v-3b-chat OpenBMB/MiniCPM-V ^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).* minicpm-v timm, transformers<4.42 vision openbmb/MiniCPM-V
minicpm-v-v2-chat OpenBMB/MiniCPM-V-2 ^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).* minicpm-v timm, transformers<4.42 vision openbmb/MiniCPM-V-2
minicpm-v-v2_5-chat OpenBMB/MiniCPM-Llama3-V-2_5 ^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).* minicpm-v-v2_5 timm, transformers>=4.36 vision openbmb/MiniCPM-Llama3-V-2_5
minicpm-v-v2_6-chat OpenBMB/MiniCPM-V-2_6 ^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).* minicpm-v-v2_6 timm, transformers>=4.36 vision, video openbmb/MiniCPM-V-2_6
pixtral-12b AI-ModelScope/pixtral-12b ^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* pixtral transformers>=4.45 vision mistral-community/pixtral-12b
mplug-owl2-chat iic/mPLUG-Owl2 q_proj, k_proj.multiway.0, k_proj.multiway.1, v_proj.multiway.0, v_proj.multiway.1 mplug-owl2 transformers<4.35, icecream vision MAGAer13/mplug-owl2-llama2-7b
mplug-owl2_1-chat iic/mPLUG-Owl2.1 c_attn.multiway.0, c_attn.multiway.1 mplug-owl2 transformers<4.35, icecream vision Mizukiluke/mplug_owl_2_1
mplug-owl3-1b-chat iic/mPLUG-Owl3-1B-241014 ^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).* mplug_owl3 transformers>=4.36, icecream vision, video mPLUG/mPLUG-Owl3-1B-241014
mplug-owl3-2b-chat iic/mPLUG-Owl3-2B-241014 ^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).* mplug_owl3 transformers>=4.36, icecream vision, video mPLUG/mPLUG-Owl3-2B-241014
mplug-owl3-7b-chat iic/mPLUG-Owl3-7B-240728 ^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).* mplug_owl3 transformers>=4.36, icecream vision, video mPLUG/mPLUG-Owl3-7B-240728
phi3-vision-128k-instruct LLM-Research/Phi-3-vision-128k-instruct ^(model.layers|model.vision_embed_tokens.img_projection)(?!.*(lm_head|output|emb|wte|shared)).* phi3-vl transformers>=4.36 vision microsoft/Phi-3-vision-128k-instruct
phi3_5-vision-instruct LLM-Research/Phi-3.5-vision-instruct ^(model.layers|model.vision_embed_tokens.img_projection)(?!.*(lm_head|output|emb|wte|shared)).* phi3-vl transformers>=4.36 vision microsoft/Phi-3.5-vision-instruct
cogvlm-17b-chat ZhipuAI/cogvlm-chat ^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* cogvlm transformers<4.42 vision THUDM/cogvlm-chat-hf
cogvlm2-19b-chat ZhipuAI/cogvlm2-llama3-chinese-chat-19B ^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* cogvlm transformers<4.42 vision THUDM/cogvlm2-llama3-chinese-chat-19B
cogvlm2-en-19b-chat ZhipuAI/cogvlm2-llama3-chat-19B ^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* cogvlm transformers<4.42 vision THUDM/cogvlm2-llama3-chat-19B
cogvlm2-video-13b-chat ZhipuAI/cogvlm2-video-llama3-chat ^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* cogvlm2-video decord, pytorchvideo, transformers>=4.42 vision, video THUDM/cogvlm2-video-llama3-chat
cogagent-18b-chat ZhipuAI/cogagent-chat ^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* cogagent-chat timm vision THUDM/cogagent-chat-hf
cogagent-18b-instruct ZhipuAI/cogagent-vqa ^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* cogagent-instruct timm vision THUDM/cogagent-vqa-hf
molmoe-1b LLM-Research/MolmoE-1B-0924 ^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).* molmo transformers>=4.45.0 vision allenai/MolmoE-1B-0924
molmo-7b-o LLM-Research/Molmo-7B-O-0924 ^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).* molmo transformers>=4.45.0 vision allenai/Molmo-7B-O-0924
molmo-7b-d LLM-Research/Molmo-7B-D-0924 ^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).* molmo transformers>=4.45.0 vision allenai/Molmo-7B-D-0924
molmo-72b LLM-Research/Molmo-72B-0924 ^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).* molmo transformers>=4.45.0 vision allenai/Molmo-72B-0924
emu3-chat BAAI/Emu3-Chat ^(model)(?!.*(lm_head|output|emb|wte|shared)).* emu3-chat transformers>=4.44.0 vision BAAI/Emu3-Chat
florence-2-base AI-ModelScope/Florence-2-base ^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).* florence vision microsoft/Florence-2-base
florence-2-base-ft AI-ModelScope/Florence-2-base-ft ^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).* florence vision microsoft/Florence-2-base-ft
florence-2-large AI-ModelScope/Florence-2-large ^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).* florence vision microsoft/Florence-2-large
florence-2-large-ft AI-ModelScope/Florence-2-large-ft ^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).* florence vision microsoft/Florence-2-large-ft
got-ocr2 stepfun-ai/GOT-OCR2_0 ^(model.layers|model.mm_projector_vary)(?!.*(lm_head|output|emb|wte|shared)).* got_ocr2 audio stepfun-ai/GOT-OCR2_0

数据集

下表介绍了swift接入的数据集的相关信息:

  • Dataset Name: 数据集在swift中注册的dataset_name.
  • Dataset ID: 数据集在ModelScope上的dataset_id.
  • Size: 数据集中的数据样本数量.
  • Statistic: 数据集的统计量. 我们使用token数进行统计, 这对于调整max_length超参数有帮助. 我们将数据集的训练集和验证集进行拼接, 然后进行统计. 我们使用qwen的tokenizer对数据集进行分词. 不同的tokenizer的统计量不同, 如果你要获取其他的模型的tokenizer的token统计量, 可以通过脚本自行获取.
Dataset Name Dataset ID Subsets Dataset Size Statistic (token) Tags HF Dataset ID
🔥ms-bench iic/ms_bench 316820 346.9±443.2, min=22, max=30960 chat, general, multi-round -
🔥alpaca-en AI-ModelScope/alpaca-gpt4-data-en 52002 176.2±125.8, min=26, max=740 chat, general vicgalle/alpaca-gpt4
🔥alpaca-zh AI-ModelScope/alpaca-gpt4-data-zh 48818 162.1±93.9, min=26, max=856 chat, general llm-wizard/alpaca-gpt4-data-zh
multi-alpaca damo/nlp_polylm_multialpaca_sft ar
de
es
fr
id
ja
ko
pt
ru
th
vi
131867 112.9±50.6, min=26, max=1226 chat, general, multilingual -
instinwild wyj123456/instinwild default
subset
103695 145.4±60.7, min=28, max=1434 - -
cot-en YorickHe/CoT 74771 122.7±64.8, min=51, max=8320 chat, general -
cot-zh YorickHe/CoT_zh 74771 117.5±70.8, min=43, max=9636 chat, general -
instruct-en wyj123456/instruct 888970 269.1±331.5, min=26, max=7254 chat, general -
firefly-zh AI-ModelScope/firefly-train-1.1M 1649399 178.1±260.4, min=26, max=12516 chat, general YeungNLP/firefly-train-1.1M
gpt4all-en wyj123456/GPT4all 806199 302.7±384.5, min=27, max=7391 chat, general -
sharegpt swift/sharegpt common-zh
computer-zh
unknow-zh
common-en
computer-en
96566 933.3±864.8, min=21, max=66412 chat, general, multi-round -
tulu-v2-sft-mixture AI-ModelScope/tulu-v2-sft-mixture 5119 520.7±437.6, min=68, max=2549 chat, multilingual, general, multi-round allenai/tulu-v2-sft-mixture
wikipedia-zh AI-ModelScope/wikipedia-cn-20230720-filtered 254547 568.4±713.2, min=37, max=78678 text-generation, general, pretrained pleisto/wikipedia-cn-20230720-filtered
open-orca AI-ModelScope/OpenOrca 994896 382.3±417.4, min=31, max=8740 chat, multilingual, general -
🔥sharegpt-gpt4 AI-ModelScope/sharegpt_gpt4 default
V3_format
zh_38K_format
72684 1047.6±1313.1, min=22, max=66412 chat, multilingual, general, multi-round, gpt4 -
deepctrl-sft AI-ModelScope/deepctrl-sft-data default
en
14149024 389.8±628.6, min=21, max=626237 chat, general, sft, multi-round -
🔥coig-cqia AI-ModelScope/COIG-CQIA chinese_traditional
coig_pc
exam
finance
douban
human_value
logi_qa
ruozhiba
segmentfault
wiki
wikihow
xhs
zhihu
44694 703.8±654.2, min=33, max=19288 general -
🔥ruozhiba AI-ModelScope/ruozhiba post-annual
title-good
title-norm
85658 39.9±13.1, min=21, max=559 pretrain -
long-alpaca-12k AI-ModelScope/LongAlpaca-12k 11998 9619.0±8295.8, min=36, max=78925 longlora, QA Yukang/LongAlpaca-12k
lmsys-chat-1m AI-ModelScope/lmsys-chat-1m - Dataset is too huge, please click the original link to view the dataset stat. chat, em lmsys/lmsys-chat-1m
🔥ms-agent iic/ms_agent 26336 650.9±217.2, min=209, max=2740 chat, agent, multi-round -
🔥ms-agent-for-agentfabric AI-ModelScope/ms_agent_for_agentfabric default
addition
30000 617.8±199.1, min=251, max=2657 chat, agent, multi-round -
ms-agent-multirole iic/MSAgent-MultiRole 9500 447.6±84.9, min=145, max=1101 chat, agent, multi-round, role-play, multi-agent -
🔥toolbench-for-alpha-umi shenweizhou/alpha-umi-toolbench-processed-v2 backbone
caller
planner
summarizer
1448337 1439.7±853.9, min=123, max=18467 chat, agent -
damo-agent-zh damo/MSAgent-Bench 386984 956.5±407.3, min=326, max=19001 chat, agent, multi-round -
damo-agent-zh-mini damo/MSAgent-Bench 20845 1326.4±329.6, min=571, max=4304 chat, agent, multi-round -
agent-instruct-all-en huangjintao/AgentInstruct_copy alfworld
db
kg
mind2web
os
webshop
1866 1144.3±635.5, min=206, max=6412 chat, agent, multi-round -
🔥msagent-pro iic/MSAgent-Pro 21905 1524.5±921.3, min=64, max=16770 chat, agent, multi-round -
toolbench swift/ToolBench 124345 3669.5±1600.9, min=1047, max=22581 chat, agent, multi-round -
code-alpaca-en wyj123456/code_alpaca_en 20016 100.2±60.1, min=29, max=1776 - sahil2801/CodeAlpaca-20k
🔥leetcode-python-en AI-ModelScope/leetcode-solutions-python 2359 727.1±235.9, min=259, max=2146 chat, coding -
🔥codefuse-python-en codefuse-ai/CodeExercise-Python-27k 27224 483.6±193.9, min=45, max=3082 chat, coding -
🔥codefuse-evol-instruction-zh codefuse-ai/Evol-instruction-66k 66862 439.6±206.3, min=37, max=2983 chat, coding -
medical-en swift/medical_zh en 117617 257.4±89.1, min=36, max=2564 chat, medical -
medical-zh swift/medical_zh zh 1950972 167.2±219.7, min=26, max=27351 chat, medical -
🔥disc-med-sft-zh AI-ModelScope/DISC-Med-SFT 441767 354.1±193.1, min=25, max=2231 chat, medical Flmc/DISC-Med-SFT
lawyer-llama-zh AI-ModelScope/lawyer_llama_data 21476 194.4±91.7, min=27, max=924 chat, law Skepsun/lawyer_llama_data
tigerbot-law-zh AI-ModelScope/tigerbot-law-plugin 55895 109.9±126.4, min=37, max=18878 text-generation, law, pretrained TigerResearch/tigerbot-law-plugin
🔥disc-law-sft-zh AI-ModelScope/DISC-Law-SFT 166758 533.7±495.4, min=30, max=15169 chat, law ShengbinYue/DISC-Law-SFT
🔥blossom-math-zh AI-ModelScope/blossom-math-v2 10000 169.3±58.7, min=35, max=563 chat, math Azure99/blossom-math-v2
school-math-zh AI-ModelScope/school_math_0.25M 248480 157.7±72.2, min=33, max=3450 chat, math, quality BelleGroup/school_math_0.25M
open-platypus-en AI-ModelScope/Open-Platypus 24926 367.9±254.8, min=30, max=3951 chat, math, quality garage-bAInd/Open-Platypus
text2sql-en AI-ModelScope/texttosqlv2_25000_v2 25000 274.6±326.4, min=38, max=1975 chat, sql Clinton/texttosqlv2_25000_v2
🔥sql-create-context-en AI-ModelScope/sql-create-context 78577 80.2±17.8, min=36, max=456 chat, sql b-mc2/sql-create-context
synthetic-text-to-sql AI-ModelScope/synthetic_text_to_sql default 100000 283.4±115.8, min=61, max=1356 nl2sql, en gretelai/synthetic_text_to_sql
🔥advertise-gen-zh lvjianjin/AdvertiseGen 98399 130.6±21.7, min=51, max=241 text-generation shibing624/AdvertiseGen
🔥dureader-robust-zh modelscope/DuReader_robust-QG 17899 241.1±137.4, min=60, max=1416 text-generation -
cmnli-zh modelscope/clue cmnli 404024 82.6±16.6, min=51, max=199 text-generation, classification clue
🔥jd-sentiment-zh DAMO_NLP/jd 50000 66.0±83.2, min=39, max=4039 text-generation, classification -
🔥hc3-zh simpleai/HC3-Chinese baike
open_qa
nlpcc_dbqa
finance
medicine
law
psychology
39781 176.8±81.5, min=57, max=3051 text-generation, classification Hello-SimpleAI/HC3-Chinese
🔥hc3-en simpleai/HC3 finance
medicine
11021 298.3±138.7, min=65, max=2267 text-generation, classification Hello-SimpleAI/HC3
dolly-15k AI-ModelScope/databricks-dolly-15k default 15011 199.2±267.8, min=22, max=8615 multi-task, en, quality databricks/databricks-dolly-15k
zhihu-kol OmniData/Zhihu-KOL default - Dataset is too huge, please click the original link to view the dataset stat. zhihu, qa wangrui6/Zhihu-KOL
zhihu-kol-filtered OmniData/Zhihu-KOL-More-Than-100-Upvotes default 271261 952.0±1727.2, min=25, max=98658 zhihu, qa bzb2023/Zhihu-KOL-More-Than-100-Upvotes
finance-en wyj123456/finance_en 68911 135.6±134.3, min=26, max=3525 chat, financial ssbuild/alpaca_finance_en
poetry-zh modelscope/chinese-poetry-collection 390309 55.2±9.4, min=23, max=83 text-generation, poetry -
webnovel-zh AI-ModelScope/webnovel_cn 50000 1478.9±11526.1, min=100, max=490484 chat, novel zxbsmk/webnovel_cn
generated-chat-zh AI-ModelScope/generated_chat_0.4M 396004 273.3±52.0, min=32, max=873 chat, character-dialogue BelleGroup/generated_chat_0.4M
🔥self-cognition swift/self-cognition 134 53.6±18.6, min=29, max=121 chat, self-cognition modelscope/self-cognition
🔥swift-mix swift/swift-sft-mixture sharegpt
firefly
codefuse
metamathqa
- Dataset is too huge, please click the original link to view the dataset stat. chat, sft, general -
cls-fudan-news-zh damo/zh_cls_fudan-news 4959 3234.4±2547.5, min=91, max=19548 chat, classification -
ner-jave-zh damo/zh_ner-JAVE 1266 118.3±45.5, min=44, max=223 chat, ner -
coco-en modelscope/coco_2014_caption coco_2014_caption 454617 299.8±2.8, min=295, max=352 chat, multi-modal, vision -
🔥coco-en-mini modelscope/coco_2014_caption coco_2014_caption 40504 299.8±2.6, min=295, max=338 chat, multi-modal, vision -
coco-en-2 modelscope/coco_2014_caption coco_2014_caption 454617 36.8±2.8, min=32, max=89 chat, multi-modal, vision -
🔥coco-en-2-mini modelscope/coco_2014_caption coco_2014_caption 40504 36.8±2.6, min=32, max=75 chat, multi-modal, vision -
capcha-images AI-ModelScope/captcha-images 8000 31.0±0.0, min=31, max=31 chat, multi-modal, vision -
latex-ocr-print AI-ModelScope/LaTeX_OCR full 17918 362.7±34.8, min=294, max=528 chat, ocr, multi-modal, vision linxy/LaTeX_OCR
latex-ocr-handwrite AI-ModelScope/LaTeX_OCR synthetic_handwrite 95424 375.1±59.4, min=292, max=2115 chat, ocr, multi-modal, vision linxy/LaTeX_OCR
aishell1-zh speech_asr/speech_asr_aishell1_trainsets 141600 152.2±36.8, min=63, max=419 chat, multi-modal, audio -
🔥aishell1-zh-mini speech_asr/speech_asr_aishell1_trainsets 14526 152.2±35.6, min=74, max=359 chat, multi-modal, audio -
🔥video-chatgpt swift/VideoChatGPT Generic
Temporal
Consistency
3206 88.4±48.3, min=32, max=399 chat, multi-modal, video lmms-lab/VideoChatGPT
egoschema AI-ModelScope/egoschema Subset 101 191.6±80.7, min=96, max=435 chat, multi-modal, video lmms-lab/egoschema
llava-video-178k lmms-lab/LLaVA-Video-178K 0_30_s_academic_v0_1
0_30_s_youtube_v0_1
1_2_m_academic_v0_1
1_2_m_youtube_v0_1
2_3_m_academic_v0_1
2_3_m_youtube_v0_1
30_60_s_academic_v0_1
30_60_s_youtube_v0_1
- Dataset is too huge, please click the original link to view the dataset stat. chat, multi-modal, video lmms-lab/LLaVA-Video-178K
moviechat-1k-test AI-ModelScope/MovieChat-1K-test 486 36.1±4.3, min=27, max=42 chat, multi-modal, video Enxin/MovieChat-1K-test
hh-rlhf AI-ModelScope/hh-rlhf harmless-base
helpful-base
helpful-online
helpful-rejection-sampled
127459 245.4±190.7, min=22, max=1999 rlhf, dpo, pairwise -
🔥hh-rlhf-cn AI-ModelScope/hh_rlhf_cn hh_rlhf
harmless_base_cn
harmless_base_en
helpful_base_cn
helpful_base_en
355920 171.2±122.7, min=22, max=3078 rlhf, dpo, pairwise -
orpo-dpo-mix-40k AI-ModelScope/orpo-dpo-mix-40k default 43666 548.3±397.4, min=28, max=8483 dpo, orpo, en, quality mlabonne/orpo-dpo-mix-40k
stack-exchange-paired AI-ModelScope/stack-exchange-paired 4483004 534.5±594.6, min=31, max=56588 hfrl, dpo, pairwise lvwerra/stack-exchange-paired
shareai-llama3-dpo-zh-en-emoji hjh0119/shareAI-Llama3-DPO-zh-en-emoji default 2449 334.0±162.8, min=36, max=1801 rlhf, dpo, pairwise -
ultrafeedback-kto AI-ModelScope/ultrafeedback-binarized-preferences-cleaned-kto default 230720 11.0±0.0, min=11, max=11 rlhf, kto -
rlaif-v swift/RLAIF-V-Dataset default 83132 119.8±52.6, min=28, max=556 rlhf, dpo, multi-modal, en openbmb/RLAIF-V-Dataset
pileval swift/pile-val-backup 214670 1612.3±8856.2, min=11, max=1208955 text-generation, awq mit-han-lab/pile-val-backup
mantis-instruct swift/Mantis-Instruct birds-to-words
chartqa
coinstruct
contrastive_caption
docvqa
dreamsim
dvqa
iconqa
imagecode
llava_665k_multi
lrv_multi
multi_vqa
nextqa
nlvr2
spot-the-diff
star
visual_story_telling
655351 825.7±812.5, min=284, max=13563 chat, multi-modal, vision, quality TIGER-Lab/Mantis-Instruct
llava-data-instruct swift/llava-data llava_instruct 364100 189.0±142.1, min=33, max=5183 sft, multi-modal, quality TIGER-Lab/llava-data
midefics swift/MideficsDataset 3800 201.3±70.2, min=60, max=454 medical, en, vqa WinterSchool/MideficsDataset
gqa None train_all_instructions - Dataset is too huge, please click the original link to view the dataset stat. multi-modal, en, vqa, quality lmms-lab/GQA
text-caps swift/TextCaps 18145 38.2±4.4, min=31, max=73 multi-modal, en, caption, quality HuggingFaceM4/TextCaps
refcoco-unofficial-caption swift/refcoco 46215 44.7±3.2, min=36, max=71 multi-modal, en, caption jxu124/refcoco
refcoco-unofficial-grounding swift/refcoco 46215 45.2±3.1, min=37, max=69 multi-modal, en, grounding jxu124/refcoco
refcocog-unofficial-caption swift/refcocog 44799 49.7±4.7, min=37, max=88 multi-modal, en, caption jxu124/refcocog
refcocog-unofficial-grounding swift/refcocog 44799 50.1±4.7, min=37, max=90 multi-modal, en, grounding jxu124/refcocog
a-okvqa swift/A-OKVQA 18201 45.8±7.9, min=32, max=100 multi-modal, en, vqa, quality HuggingFaceM4/A-OKVQA
okvqa swift/OK-VQA_train 9009 34.4±3.3, min=28, max=59 multi-modal, en, vqa, quality Multimodal-Fatima/OK-VQA_train
ocr-vqa swift/OCR-VQA 186753 35.6±6.6, min=29, max=193 multi-modal, en, ocr-vqa howard-hou/OCR-VQA
grit swift/GRIT - Dataset is too huge, please click the original link to view the dataset stat. multi-modal, en, caption-grounding, quality zzliang/GRIT
llava-instruct-mix swift/llava-instruct-mix-vsft 13640 179.8±120.2, min=30, max=962 multi-modal, en, vqa, quality HuggingFaceH4/llava-instruct-mix-vsft
lnqa swift/lnqa - Dataset is too huge, please click the original link to view the dataset stat. multi-modal, en, ocr-vqa, quality vikhyatk/lnqa
science-qa swift/ScienceQA 8315 100.3±59.5, min=38, max=638 multi-modal, science, vqa, quality derek-thomas/ScienceQA
guanaco AI-ModelScope/GuanacoDataset default 31561 250.1±70.3, min=89, max=1436 chat, zh JosephusCheung/GuanacoDataset
mind2web swift/Multimodal-Mind2Web 1009 297522.4±325496.2, min=8592, max=3499715 agent, multi-modal osunlp/Multimodal-Mind2Web
sharegpt-4o-image AI-ModelScope/ShareGPT-4o image_caption 57289 638.7±157.9, min=47, max=4640 vqa, multi-modal OpenGVLab/ShareGPT-4o
pixelprose swift/pixelprose - Dataset is too huge, please click the original link to view the dataset stat. caption, multi-modal, vision tomg-group-umd/pixelprose
m3it AI-ModelScope/M3IT coco
vqa-v2
shapes
shapes-rephrased
coco-goi-rephrased
snli-ve
snli-ve-rephrased
okvqa
a-okvqa
viquae
textcap
docvqa
science-qa
imagenet
imagenet-open-ended
imagenet-rephrased
coco-goi
clevr
clevr-rephrased
nlvr
coco-itm
coco-itm-rephrased
vsr
vsr-rephrased
mocheg
mocheg-rephrased
coco-text
fm-iqa
activitynet-qa
msrvtt
ss
coco-cn
refcoco
refcoco-rephrased
multi30k
image-paragraph-captioning
visual-dialog
visual-dialog-rephrased
iqa
vcr
visual-mrc
ivqa
msrvtt-qa
msvd-qa
gqa
text-vqa
ocr-vqa
st-vqa
flickr8k-cn
- Dataset is too huge, please click the original link to view the dataset stat. chat, multi-modal, vision -
sharegpt4v AI-ModelScope/ShareGPT4V ShareGPT4V
ShareGPT4V-PT
- Dataset is too huge, please click the original link to view the dataset stat. chat, multi-modal, vision -
llava-instruct-150k AI-ModelScope/LLaVA-Instruct-150K 624610 490.4±180.2, min=288, max=5438 chat, multi-modal, vision -
llava-pretrain AI-ModelScope/LLaVA-Pretrain default - Dataset is too huge, please click the original link to view the dataset stat. vqa, multi-modal, quality liuhaotian/LLaVA-Pretrain
sa1b-dense-caption Tongyi-DataEngine/SA1B-Dense-Caption - Dataset is too huge, please click the original link to view the dataset stat. zh, multi-modal, vqa -
sa1b-paired-caption Tongyi-DataEngine/SA1B-Paired-Captions-Images - Dataset is too huge, please click the original link to view the dataset stat. zh, multi-modal, vqa -
alpaca-cleaned AI-ModelScope/alpaca-cleaned 51760 177.9±126.4, min=26, max=1044 chat, general, bench, quality yahma/alpaca-cleaned
aya-collection swift/aya_collection aya_dataset 202364 494.0±6911.3, min=21, max=3044268 multi-lingual, qa CohereForAI/aya_collection
belle-generated-chat-0.4M AI-ModelScope/generated_chat_0.4M 396004 273.3±52.0, min=32, max=873 common, zh BelleGroup/generated_chat_0.4M
belle-math-0.25M AI-ModelScope/school_math_0.25M 248480 157.7±72.2, min=33, max=3450 math, zh BelleGroup/school_math_0.25M
belle-train-0.5M-CN AI-ModelScope/train_0.5M_CN 519255 129.1±91.5, min=27, max=6507 common, zh, quality BelleGroup/train_0.5M_CN
belle-train-1M-CN AI-ModelScope/train_1M_CN - Dataset is too huge, please click the original link to view the dataset stat. common, zh, quality BelleGroup/train_1M_CN
belle-train-2M-CN AI-ModelScope/train_2M_CN - Dataset is too huge, please click the original link to view the dataset stat. common, zh, quality BelleGroup/train_2M_CN
belle-train-3.5M-CN swift/train_3.5M_CN - Dataset is too huge, please click the original link to view the dataset stat. common, zh, quality BelleGroup/train_3.5M_CN
c4 None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality allenai/c4
chart-qa swift/ChartQA 28299 43.1±5.5, min=29, max=77 en, vqa, quality HuggingFaceM4/ChartQA
chinese-c4 swift/chinese-c4 - Dataset is too huge, please click the original link to view the dataset stat. pretrain, zh, quality shjwudp/chinese-c4
cinepile swift/cinepile - Dataset is too huge, please click the original link to view the dataset stat. vqa, en, youtube, video tomg-group-umd/cinepile
classical-chinese-translate swift/classical_chinese_translate 6655 344.0±76.4, min=61, max=815 chat, play-ground -
codealpaca-20k AI-ModelScope/CodeAlpaca-20k 20016 100.2±60.1, min=29, max=1776 code, en HuggingFaceH4/CodeAlpaca_20K
cosmopedia None auto_math_text
khanacademy
openstax
stanford
stories
web_samples_v1
web_samples_v2
wikihow
- Dataset is too huge, please click the original link to view the dataset stat. multi-domain, en, qa HuggingFaceTB/cosmopedia
cosmopedia-100k swift/cosmopedia-100k 100000 1024.5±243.1, min=239, max=2981 multi-domain, en, qa HuggingFaceTB/cosmopedia-100k
dolma swift/dolma v1_7 - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality allenai/dolma
dolphin swift/dolphin flan1m-alpaca-uncensored
flan5m-alpaca-uncensored
- Dataset is too huge, please click the original link to view the dataset stat. en cognitivecomputations/dolphin
duet AI-ModelScope/Duet-v0.5 5000 1157.4±189.3, min=657, max=2344 CoT, en G-reen/Duet-v0.5
evol-instruct-v2 AI-ModelScope/WizardLM_evol_instruct_V2_196k 109184 480.9±333.1, min=26, max=4942 chat, en WizardLM/WizardLM_evol_instruct_V2_196k
fineweb None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality HuggingFaceFW/fineweb
gen-qa swift/GenQA - Dataset is too huge, please click the original link to view the dataset stat. qa, quality, multi-task tomg-group-umd/GenQA
github-code swift/github-code - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality codeparrot/github-code
gpt4v-dataset swift/gpt4v-dataset 12356 217.9±68.3, min=35, max=596 en, caption, multi-modal, quality laion/gpt4v-dataset
guanaco-belle-merge AI-ModelScope/guanaco_belle_merge_v1.0 693987 134.2±92.0, min=24, max=6507 QA, zh Chinese-Vicuna/guanaco_belle_merge_v1.0
infinity-instruct swift/Infinity-Instruct - Dataset is too huge, please click the original link to view the dataset stat. qa, quality, multi-task BAAI/Infinity-Instruct
llava-med-zh-instruct swift/llava-med-zh-instruct-60k 56649 207.7±67.6, min=37, max=657 zh, medical, vqa BUAADreamer/llava-med-zh-instruct-60k
🔥longwriter-6k ZhipuAI/LongWriter-6k 6000 4887.2±2879.2, min=117, max=30354 long, chat, sft THUDM/LongWriter-6k
🔥longwriter-6k-filtered swift/longwriter-6k-filtered 666 4108.9±2636.9, min=1190, max=17050 long, chat, sft -
math-instruct AI-ModelScope/MathInstruct 262283 254.4±183.5, min=11, max=4383 math, cot, en, quality TIGER-Lab/MathInstruct
math-plus TIGER-Lab/MATH-plus train 893929 287.1±158.7, min=24, max=2919 qa, math, en, quality TIGER-Lab/MATH-plus
moondream2-coyo-5M swift/moondream2-coyo-5M-captions - Dataset is too huge, please click the original link to view the dataset stat. caption, pretrain, quality isidentical/moondream2-coyo-5M-captions
no-robots swift/no_robots 9485 298.7±246.4, min=40, max=6739 multi-task, quality, human-annotated HuggingFaceH4/no_robots
open-hermes swift/OpenHermes-2.5 - Dataset is too huge, please click the original link to view the dataset stat. cot, en, quality teknium/OpenHermes-2.5
open-orca-chinese AI-ModelScope/OpenOrca-Chinese - Dataset is too huge, please click the original link to view the dataset stat. QA, zh, general, quality yys/OpenOrca-Chinese
orca_dpo_pairs swift/orca_dpo_pairs 12859 366.9±251.9, min=30, max=2010 rlhf, quality Intel/orca_dpo_pairs
path-vqa swift/path-vqa 19654 34.8±7.3, min=27, max=85 multi-modal, vqa, medical flaviagiammarino/path-vqa
pile AI-ModelScope/pile - Dataset is too huge, please click the original link to view the dataset stat. pretrain EleutherAI/pile
poison-mpts iic/100PoisonMpts 906 150.6±80.8, min=39, max=656 poison-management, zh -
🔥qwen2-pro-en AI-ModelScope/Magpie-Qwen2-Pro-200K-English 200000 605.4±287.3, min=221, max=4267 chat, sft, en Magpie-Align/Magpie-Qwen2-Pro-200K-English
🔥qwen2-pro-filtered AI-ModelScope/Magpie-Qwen2-Pro-300K-Filtered 300000 555.8±286.6, min=148, max=4267 chat, sft Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered
🔥qwen2-pro-zh AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese 200000 446.2±246.4, min=74, max=4101 chat, sft, zh Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese
redpajama-data-1t swift/RedPajama-Data-1T - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality togethercomputer/RedPajama-Data-1T
redpajama-data-v2 swift/RedPajama-Data-V2 - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality togethercomputer/RedPajama-Data-V2
refinedweb None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality tiiuae/falcon-refinedweb
rwkv-pretrain-web mapjack/openwebtext_dataset - Dataset is too huge, please click the original link to view the dataset stat. pretrain, zh, quality -
sft-nectar AI-ModelScope/SFT-Nectar 131192 396.4±272.1, min=44, max=10732 cot, en, quality AstraMindAI/SFT-Nectar
skypile AI-ModelScope/SkyPile-150B - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality, zh Skywork/SkyPile-150B
slim-orca swift/SlimOrca 517982 399.1±370.2, min=35, max=8756 quality, en Open-Orca/SlimOrca
slim-pajama-627b None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality cerebras/SlimPajama-627B
starcoder AI-ModelScope/starcoderdata - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality bigcode/starcoderdata
tagengo-gpt4 swift/tagengo-gpt4 78057 472.3±292.9, min=22, max=3521 chat, multi-lingual, quality lightblue/tagengo-gpt4
the-stack AI-ModelScope/the-stack - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality bigcode/the-stack
ultrachat-200k swift/ultrachat_200k 207865 1195.4±573.7, min=76, max=4470 chat, en, quality HuggingFaceH4/ultrachat_200k
vqa-v2 swift/VQAv2 443757 31.8±2.2, min=27, max=58 en, vqa, quality HuggingFaceM4/VQAv2
web-instruct-sub swift/WebInstructSub - Dataset is too huge, please click the original link to view the dataset stat. qa, en, math, quality, multi-domain, science TIGER-Lab/WebInstructSub
wikipedia swift/wikipedia - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality wikipedia
wikipedia-cn-filtered AI-ModelScope/wikipedia-cn-20230720-filtered - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality pleisto/wikipedia-cn-20230720-filtered
zhihu-rlhf AI-ModelScope/zhihu_rlhf_3k 3460 594.5±365.9, min=31, max=1716 rlhf, dpo, zh liyucheng/zhihu_rlhf_3k