Update model_loader deps and qqq quantization deps #2220

HandH1998 · 2024-11-27T09:27:00Z

Motivation

Update the model_loader deps and qqq quantization deps for SGLang.

Modifications

We modified the relevant code primarily according to vLLM. Thanks the vLLM team for their significant contributions. Here we list the main modifications.

We adapted the model_loader code from https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/model_loader and modified it adaptively for SGLang. The updated model_loader code is located at python/sglang/srt/model_loader.
We added registry.py at python/sglang/srt/models/registry.py and registered all the models into class ModelRegistry. Consequently, we removed all monkey patches in python/sglang/srt/model_executor/model_runner.py.
We adapted the qqq quantization from https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/qqq.py and replaced the gemm with themarlin_qqq_gemm from torchao. For more details on qqq, please refer to our paper and our code repo.

HandH1998 · 2024-11-27T09:51:29Z

There are some failures due to cannot import name 'marlin_qqq_gemm' from 'torchao.ops' (/usr/local/lib/python3.10/dist-packages/torchao/ops.py) in the CR. This issue arises because the installed version of torchao is v0.6.1, which does not support marlin_qqq_gemm. Although our marlin_qqq_gemm has been merged into the main branch of torchao, the torchao team has not yet released a new version which supports marlin_qqq_gemm.

zhyncs

Overall LGTM left some comments
Except for rope, vllm.distributed and quant, everything else related to vllm needs to be removed, such as some utils
BTW python/sglang/srt/models/phi3_small.py should also be handled

zhyncs · 2024-11-27T09:46:57Z

python/sglang/srt/layers/quantization/qqq.py

+
+import torch
+from torch.nn.parameter import Parameter
+from torchao.ops import marlin_qqq_gemm


ImportError: cannot import name 'marlin_qqq_gemm' from 'torchao.ops'

It should be due to the version, the current release of torchao (v0.6.1) does not include qqq.

Perhaps we can introduce qqq in the next PR after torchao releases a new version, how about that @HandH1998

zhyncs · 2024-11-27T09:47:23Z

python/sglang/srt/layers/quantization/qqq.py

+from torchao.ops import marlin_qqq_gemm
+from torchao.quantization.utils import dynamically_quantize_per_channel
+from vllm.model_executor.layers.linear import LinearBase, LinearMethodBase
+from vllm.model_executor.parameter import (


This part also needs to be migrated.

zhyncs · 2024-11-27T09:47:43Z

python/sglang/srt/model_loader/__init__.py

+from typing import Optional
+
+from torch import nn
+from vllm.config import (


This part also needs to be migrated.

zhyncs · 2024-11-27T09:47:54Z

python/sglang/srt/model_loader/loader.py

+from torch import nn
+from transformers import AutoModelForCausalLM, PretrainedConfig
+from transformers.utils import SAFE_WEIGHTS_INDEX_NAME
+from vllm.config import (


This part also needs to be migrated.

zhyncs · 2024-11-27T09:49:05Z

python/sglang/srt/model_loader/loader.py

+    get_tensor_model_parallel_rank,
+    get_tensor_model_parallel_world_size,
+)
+from vllm.envs import VLLM_USE_MODELSCOPE


L39:46 This part also needs to be migrated.

zhyncs · 2024-11-27T09:49:25Z

python/sglang/srt/model_loader/utils.py

+
+import torch
+from torch import nn
+from vllm.config import ModelConfig


This part also needs to be migrated.

zhyncs · 2024-11-27T09:49:59Z

python/sglang/srt/model_loader/weight_utils.py

+from huggingface_hub import HfFileSystem, hf_hub_download, snapshot_download
+from safetensors.torch import load_file, safe_open, save_file
+from tqdm.auto import tqdm
+from vllm.config import LoadConfig, ModelConfig


L22:L25 This part also needs to be migrated. (except vllm.distributed)

remove vllm model_loader deps and support qqq quantization

b17b685

HandH1998 requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners November 27, 2024 09:27

zhyncs assigned zhyncs and ispobock Nov 27, 2024

zhyncs added the high priority label Nov 27, 2024

zhyncs assigned Ying1123 Nov 27, 2024

zhyncs reviewed Nov 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update model_loader deps and qqq quantization deps #2220

Update model_loader deps and qqq quantization deps #2220

HandH1998 commented Nov 27, 2024

HandH1998 commented Nov 27, 2024

zhyncs left a comment

zhyncs Nov 27, 2024

zhyncs Nov 27, 2024

HandH1998 Nov 27, 2024

zhyncs Nov 27, 2024

zhyncs Nov 27, 2024

zhyncs Nov 27, 2024

zhyncs Nov 27, 2024

zhyncs Nov 27, 2024

zhyncs Nov 27, 2024

Update model_loader deps and qqq quantization deps #2220

Are you sure you want to change the base?

Update model_loader deps and qqq quantization deps #2220

Conversation

HandH1998 commented Nov 27, 2024

Motivation

Modifications

HandH1998 commented Nov 27, 2024

zhyncs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment