Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update model_loader deps and qqq quantization deps #2220

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

HandH1998
Copy link

Motivation

Update the model_loader deps and qqq quantization deps for SGLang.

Modifications

We modified the relevant code primarily according to vLLM. Thanks the vLLM team for their significant contributions. Here we list the main modifications.

@HandH1998
Copy link
Author

There are some failures due to cannot import name 'marlin_qqq_gemm' from 'torchao.ops' (/usr/local/lib/python3.10/dist-packages/torchao/ops.py) in the CR. This issue arises because the installed version of torchao is v0.6.1, which does not support marlin_qqq_gemm. Although our marlin_qqq_gemm has been merged into the main branch of torchao, the torchao team has not yet released a new version which supports marlin_qqq_gemm.

Copy link
Member

@zhyncs zhyncs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM left some comments
Except for rope, vllm.distributed and quant, everything else related to vllm needs to be removed, such as some utils
BTW python/sglang/srt/models/phi3_small.py should also be handled


import torch
from torch.nn.parameter import Parameter
from torchao.ops import marlin_qqq_gemm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImportError: cannot import name 'marlin_qqq_gemm' from 'torchao.ops'

It should be due to the version, the current release of torchao (v0.6.1) does not include qqq.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can introduce qqq in the next PR after torchao releases a new version, how about that @HandH1998

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

from torchao.ops import marlin_qqq_gemm
from torchao.quantization.utils import dynamically_quantize_per_channel
from vllm.model_executor.layers.linear import LinearBase, LinearMethodBase
from vllm.model_executor.parameter import (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part also needs to be migrated.

from typing import Optional

from torch import nn
from vllm.config import (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part also needs to be migrated.

from torch import nn
from transformers import AutoModelForCausalLM, PretrainedConfig
from transformers.utils import SAFE_WEIGHTS_INDEX_NAME
from vllm.config import (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part also needs to be migrated.

get_tensor_model_parallel_rank,
get_tensor_model_parallel_world_size,
)
from vllm.envs import VLLM_USE_MODELSCOPE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L39:46 This part also needs to be migrated.


import torch
from torch import nn
from vllm.config import ModelConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part also needs to be migrated.

from huggingface_hub import HfFileSystem, hf_hub_download, snapshot_download
from safetensors.torch import load_file, safe_open, save_file
from tqdm.auto import tqdm
from vllm.config import LoadConfig, ModelConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L22:L25 This part also needs to be migrated. (except vllm.distributed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants