Skip to content

Adding support for registaration of non transformer models like swiftkv in QEfficient #291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 140 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
140 commits
Select commit Hold shift + click to select a range
909189d
added initial version of SwiftKV for AI 100
ochougul Dec 16, 2024
ef47eb9
BUGFIX
ochougul Dec 16, 2024
f43b345
BUGFIX
ochougul Dec 16, 2024
5fbc10b
BUGFIX
ochougul Dec 16, 2024
a6a3727
BUGFIX
ochougul Dec 16, 2024
5c094e2
BUGFIX
ochougul Dec 16, 2024
5259873
BUGFIX
ochougul Dec 16, 2024
39034c8
BUGFIX
ochougul Dec 17, 2024
cd01714
BUGFIX
ochougul Dec 17, 2024
a9539bf
BUGFIX
ochougul Dec 17, 2024
4bafed0
BUGFIX
ochougul Dec 17, 2024
c015d63
BUGFIX
ochougul Dec 17, 2024
b35cdd4
all bugfixes in
ochougul Dec 19, 2024
c5914a5
added init file
ochougul Dec 19, 2024
23df777
all changes except BQA are in with this
ochougul Jan 9, 2025
f7bad4b
more updates
ochougul Feb 5, 2025
2a37e62
Enabling the SwiftKV model in the QEFF Infra
quic-hemagnih Feb 27, 2025
b280225
rebased
ochougul Feb 27, 2025
9f5bca6
moving registration of non transformer models during initialization o…
quic-hemagnih Feb 27, 2025
991e3bf
fixed lint warnings
quic-hemagnih Mar 4, 2025
f384533
enabling faster downloads via hf_transfer (#282)
ochougul Feb 28, 2025
aa79836
upgrading from yanked version (#276)
ochougul Feb 28, 2025
d5c5179
Added example script for InternVL (#269)
quic-dhirajku Feb 28, 2025
6fc7bb6
prompt-lookup decoding example (#235)
eplatero97 Feb 28, 2025
5757301
New format of Documentation (#240)
quic-amitraj Feb 28, 2025
47577f8
Removed warning and override of mxfp6 for internal use (#277)
quic-amitraj Feb 28, 2025
87d8781
Added support of 2qpcs for internvl and llava (#279)
mohiso22 Feb 28, 2025
d1e60b7
Removed onnx_defer_loading flag. (#295)
shubhagr-quic Mar 3, 2025
da1d1da
Code for SDK configs Inclusion (#203)
abukhoy Mar 3, 2025
4b373b8
Fixed the compilation errors
quic-hemagnih Mar 5, 2025
8fbc881
Fixed the lint error
quic-hemagnih Mar 5, 2025
40d921a
fixed ruff errors
quic-hemagnih Mar 5, 2025
7598ec7
fixed ruff errors
quic-hemagnih Mar 5, 2025
a822f39
Address review comments
quic-hemagnih Mar 12, 2025
4e80fe8
added initial version of SwiftKV for AI 100
ochougul Dec 16, 2024
4168b33
BUGFIX
ochougul Dec 16, 2024
180a9e7
BUGFIX
ochougul Dec 16, 2024
2aeded3
BUGFIX
ochougul Dec 16, 2024
95a11a2
BUGFIX
ochougul Dec 16, 2024
0d8f1da
BUGFIX
ochougul Dec 16, 2024
025017b
BUGFIX
ochougul Dec 16, 2024
f559ad9
BUGFIX
ochougul Dec 17, 2024
f4b5d6e
BUGFIX
ochougul Dec 17, 2024
f26842a
BUGFIX
ochougul Dec 17, 2024
1b9b914
BUGFIX
ochougul Dec 17, 2024
39b1dd2
BUGFIX
ochougul Dec 17, 2024
2785355
all bugfixes in
ochougul Dec 19, 2024
2b51947
added init file
ochougul Dec 19, 2024
59c30a9
all changes except BQA are in with this
ochougul Jan 9, 2025
8cac6b9
more updates
ochougul Feb 5, 2025
4e63fac
Enabling the SwiftKV model in the QEFF Infra
quic-hemagnih Feb 27, 2025
78e9257
rebased
ochougul Feb 27, 2025
d33c22e
moving registration of non transformer models during initialization o…
quic-hemagnih Feb 27, 2025
ca2870f
fixed lint warnings
quic-hemagnih Mar 4, 2025
6665a3a
enabling faster downloads via hf_transfer (#282)
ochougul Feb 28, 2025
01d9a87
Fixed the compilation errors
quic-hemagnih Mar 5, 2025
5be5afa
Fixed the lint error
quic-hemagnih Mar 5, 2025
68e92ab
fixed ruff errors
quic-hemagnih Mar 5, 2025
aff64ab
fixed ruff errors
quic-hemagnih Mar 5, 2025
6fa6c9a
Address review comments
quic-hemagnih Mar 12, 2025
0ef8f61
Rebased and fixed the lint errors
quic-hemagnih Mar 12, 2025
29da089
rebased
quic-hemagnih Mar 12, 2025
5217976
Fix the lint errors
quic-hemagnih Mar 12, 2025
faef011
[QEff. Finetune] : Use login_and_download_hf_lm in finetuning path (#…
quic-mamta Jan 22, 2025
680a25b
Installing python package rich to resolve QNN tests failure. (#241)
shubhagr-quic Jan 24, 2025
b5cb9b4
Removed onnx_defer_loading from Immutable Convertor Args. (#230)
shubhagr-quic Jan 27, 2025
892f2a7
Porting hf_token fix (#246)
asmigosw Jan 27, 2025
d6d9a77
[Attention output Reshape] : Issue fixed (#243)
abukhoy Jan 31, 2025
7cc1b34
[QEff. Finetune] Stop fine tuning when loss has converged (#257)
quic-swatia Feb 11, 2025
963986b
Mllama(single + dual) + InternVL(single) + Llava (single) (#267)
ochougul Feb 14, 2025
725a7c1
Migrating HL compile and export to infer APIs (#214)
asmigosw Feb 17, 2025
5e37dfc
Hotfix-1 for Intern model (#270)
quic-amitraj Feb 18, 2025
f68dd8d
[Readme Update] : Deepseek Distills Models Added (#263)
abukhoy Feb 18, 2025
b3bb4be
Enabling FP8 models for `Cloud AI 100` (#248)
ochougul Feb 18, 2025
1d234f5
Added example script to use embedding model (#237)
quic-amitraj Feb 18, 2025
18b0c4d
Add prompt_to_lora_id_mapping adjustment in fix_prompts() (#242)
quic-jouachen Feb 19, 2025
b82c27a
Add support for model ibm-granite/granite-3.1-8b-instruct (#239)
quic-akuruvil Feb 19, 2025
5ea09e5
HOTFIX/fixed replace quantizers (#273)
ochougul Feb 19, 2025
eded9a9
HOTFIX/compiler arguments fix for VLM (#274)
quic-amitraj Feb 21, 2025
546c434
Support for Prefix caching Feature in QNN Compilation Path. (#262)
shubhagr-quic Feb 21, 2025
68e4ab7
HOTFIX/kv_offload fix (#278)
quic-amitraj Feb 24, 2025
c5c5bfd
Onboarding Whisper with Single QPC (#271)
kdulla Feb 24, 2025
6c7157d
Updated mos to optional argument (#281)
asmigosw Feb 25, 2025
5ef6c7e
[QEff Finetune] change default dropout in lora to 0.0 (#284)
vbaddi Feb 26, 2025
6534869
Updating the code owners list (#288)
quic-hemagnih Feb 27, 2025
493a8e2
Revert "Installing python package rich to resolve QNN tests failure."…
shubhagr-quic Feb 27, 2025
43af9f6
enabling faster downloads via hf_transfer (#282)
ochougul Feb 28, 2025
53c3564
upgrading from yanked version (#276)
ochougul Feb 28, 2025
bbfc4de
Added example script for InternVL (#269)
quic-dhirajku Feb 28, 2025
f1aa984
prompt-lookup decoding example (#235)
eplatero97 Feb 28, 2025
46e28a0
New format of Documentation (#240)
quic-amitraj Feb 28, 2025
e7796a4
Removed warning and override of mxfp6 for internal use (#277)
quic-amitraj Feb 28, 2025
b9a74cf
Added support of 2qpcs for internvl and llava (#279)
mohiso22 Feb 28, 2025
756e729
Removed onnx_defer_loading flag. (#295)
shubhagr-quic Mar 3, 2025
5f2bd31
Code for SDK configs Inclusion (#203)
abukhoy Mar 3, 2025
a45b5c4
Docs string added for the Image class and granite models are added in…
abukhoy Mar 6, 2025
a276806
[Bug-Fix :] QEFFAutoModelForCausalLM __repr__() Method Fixed (#307)
abukhoy Mar 6, 2025
d88e124
added initial version of SwiftKV for AI 100
ochougul Dec 16, 2024
860ac4f
BUGFIX
ochougul Dec 16, 2024
d0f7479
BUGFIX
ochougul Dec 16, 2024
c644856
BUGFIX
ochougul Dec 16, 2024
5511107
BUGFIX
ochougul Dec 16, 2024
0089540
BUGFIX
ochougul Dec 16, 2024
02b48ff
BUGFIX
ochougul Dec 16, 2024
757f10a
BUGFIX
ochougul Dec 17, 2024
16cd029
BUGFIX
ochougul Dec 17, 2024
ee36aa1
BUGFIX
ochougul Dec 17, 2024
4203a07
BUGFIX
ochougul Dec 17, 2024
ee2f7e1
BUGFIX
ochougul Dec 17, 2024
80730dd
all bugfixes in
ochougul Dec 19, 2024
4b07373
added init file
ochougul Dec 19, 2024
e34d79a
all changes except BQA are in with this
ochougul Jan 9, 2025
ed909a9
more updates
ochougul Feb 5, 2025
4e4300d
Enabling the SwiftKV model in the QEFF Infra
quic-hemagnih Feb 27, 2025
0684de3
rebased
ochougul Feb 27, 2025
9fa21da
moving registration of non transformer models during initialization o…
quic-hemagnih Feb 27, 2025
cb3b0ba
fixed lint warnings
quic-hemagnih Mar 4, 2025
5d9d1e5
enabling faster downloads via hf_transfer (#282)
ochougul Feb 28, 2025
2ca9360
Fixed the compilation errors
quic-hemagnih Mar 5, 2025
0f5cfcf
Fixed the lint error
quic-hemagnih Mar 5, 2025
c1f8a6b
fixed ruff errors
quic-hemagnih Mar 5, 2025
dc059e4
fixed ruff errors
quic-hemagnih Mar 5, 2025
4fdebc7
Address review comments
quic-hemagnih Mar 12, 2025
5b04f83
Rebased and fixed the lint errors
quic-hemagnih Mar 12, 2025
393c428
added initial version of SwiftKV for AI 100
ochougul Dec 16, 2024
273258e
BUGFIX
ochougul Dec 16, 2024
8d43032
BUGFIX
ochougul Dec 17, 2024
339ce89
Enabling the SwiftKV model in the QEFF Infra
quic-hemagnih Feb 27, 2025
b24399b
rebased
ochougul Feb 27, 2025
98b5b61
moving registration of non transformer models during initialization o…
quic-hemagnih Feb 27, 2025
00abd98
fixed lint warnings
quic-hemagnih Mar 4, 2025
8396903
enabling faster downloads via hf_transfer (#282)
ochougul Feb 28, 2025
4a5cd48
Fixed the compilation errors
quic-hemagnih Mar 5, 2025
6abceb6
Fixed the lint error
quic-hemagnih Mar 5, 2025
d026845
fixed ruff errors
quic-hemagnih Mar 5, 2025
7010a75
fixed ruff errors
quic-hemagnih Mar 5, 2025
582c1d4
Address review comments
quic-hemagnih Mar 12, 2025
cc895b7
Fix the lint errors
quic-hemagnih Mar 12, 2025
0726e75
rebased and fixed lint erros
quic-hemagnih Mar 12, 2025
01ab5bd
Merge branch 'supp_new_model' of github.com:quic-hemagnih/efficient-t…
quic-hemagnih Mar 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 19 additions & 1 deletion QEfficient/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# -----------------------------------------------------------------------------
#
# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
# Copyright (c) 2025 Qualcomm Innovation Center, Inc. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# -----------------------------------------------------------------------------
Expand All @@ -12,8 +12,26 @@
# hf_transfer is imported (will happen on line 15 via leading imports)
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

from transformers import AutoConfig

from QEfficient.transformers.modeling_utils import (
MODEL_TYPE_TO_CONFIG_CLS_AND_ARCH_CLS,
get_auto_model_class,
get_model_class_type_from_model_type,
)
from QEfficient.utils.logging_utils import logger

# loop over all the model types which are not present in transformers and register them
for model_type, model_cls in MODEL_TYPE_TO_CONFIG_CLS_AND_ARCH_CLS.items():
# Register the model config class based on the model type. This will be first element in the tuple
AutoConfig.register(model_type, model_cls[0])

model_class_type = get_model_class_type_from_model_type(model_type)
AutoModelClassName = get_auto_model_class(model_class_type, model_cls[1])

# Register the non transformer library Class and config class using AutoModelClass
AutoModelClassName.register(model_cls[0], model_cls[1])


def check_qaic_sdk():
"""Check if QAIC SDK is installed"""
Expand Down
29 changes: 29 additions & 0 deletions QEfficient/transformers/cache_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,35 @@ class QEffDynamicCache(DynamicCache):

"""

def write_only(self, key_states, value_states, layer_idx, cache_kwargs):
# Update the cache
if len(self.key_cache) <= layer_idx:
self.key_cache.append(key_states)
self.value_cache.append(value_states)
else:
position_ids = cache_kwargs.get("position_ids")
self.key_cache[layer_idx] = CtxScatterFunc.apply(self.key_cache[layer_idx], position_ids, key_states)
self.value_cache[layer_idx] = CtxScatterFunc.apply(self.value_cache[layer_idx], position_ids, value_states)

def read_only(self, layer_idx, **cache_kwargs):
k_out, v_out = self.key_cache[layer_idx], self.value_cache[layer_idx]
position_ids = cache_kwargs.get("position_ids")
ctx_len = k_out.shape[2]
ctx_indices = torch.arange(ctx_len)[None, None, ...]
gather_limit = position_ids.max(1, keepdim=True).values.unsqueeze(1)
invalid_mask = ctx_indices > gather_limit

if torch.onnx.is_in_onnx_export():
invalid_idx_value = torch.iinfo(torch.int32).max
else:
invalid_idx_value = 0

ctx_indices = torch.where(invalid_mask, invalid_idx_value, ctx_indices)
k_out = CtxGatherFunc.apply(k_out, ctx_indices)
v_out = CtxGatherFunc.apply(v_out, ctx_indices)
v_out = torch.where(invalid_mask.unsqueeze(-1), torch.tensor(0.0, dtype=torch.float32), v_out)
return k_out, v_out

def update(
self,
key_states: torch.Tensor,
Expand Down
65 changes: 65 additions & 0 deletions QEfficient/transformers/modeling_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,12 @@

from QEfficient.customop import CustomRMSNormAIC

# Placeholder for all non-transformer models
from QEfficient.transformers.models.llama_swiftkv.modeling_llama_swiftkv import (
LlamaSwiftKVConfig,
LlamaSwiftKVForCausalLM,
)

from .models.codegen.modeling_codegen import (
QEffCodeGenAttention,
QeffCodeGenBlock,
Expand Down Expand Up @@ -271,6 +277,17 @@
WhisperForConditionalGeneration: QEffWhisperForConditionalGeneration,
}

# Map of model type to config class and Model architecture class
# While onboarding new models make sure to add the new model card names to this dictionary.
# Developers are expected to follow the naming conventions like ForCausalLM while defining the class names
MODEL_TYPE_TO_CONFIG_CLS_AND_ARCH_CLS = {"llama_swiftkv": [LlamaSwiftKVConfig, LlamaSwiftKVForCausalLM]}

# list of sub-strings representing the model type, this is typically taken from llama-swiftkv
LIST_OF_MODEL_TYPES = {"swiftkv"}

# list of sub-strings used for representing the model Architecture class name, for example LlamaSwiftKVForCausalLM
MODEL_TYPE_TO_MODEL_CLASS_TYPE = {"swiftkv": "SwiftKVFor"}


def _prepare_cross_attention_mask(
cross_attention_mask: torch.Tensor,
Expand Down Expand Up @@ -362,3 +379,51 @@ def _create_causal_mask(
attention_mask = attention_mask.unsqueeze(1)

return attention_mask


def convert_str_to_class(className):
"""
Convert the string to class name
---------
:className: `str`- Class name string.
Return:
Class Name
"""
module = __import__("transformers")
return getattr(module, className)


def get_auto_model_class(model_type, NonTransformerModelCls):
"""
Register the Non Transformer Models like swiftkv
---------------------------------------
: model_type: str: name of the Non Transformer model for example llama_swiftkv
: NonTransformerModelCls: SwiftKV model class name for example LlamaSwiftKVForCausalLM
"""

# Construct the AutoModel class name using NonTransformerModel class e.g. SwiftKVModel Class name, this code is written to make things generic
nonTransformerModelClsName = NonTransformerModelCls.__name__
start_index = nonTransformerModelClsName.find(model_type)

# Calculate the index after model_type example "SwiftKVFor"
substring_start = start_index + len(model_type)

# Get the substring after model_type example "SwiftKVFor"
nonTransformerModel = nonTransformerModelClsName[substring_start:]

autoModelName = "AutoModelFor" + nonTransformerModel

# Convert the string to class name
autoModelClassName = convert_str_to_class(autoModelName)

return autoModelClassName


def get_model_class_type_from_model_type(model_type):
for substring in LIST_OF_MODEL_TYPES:
if substring in model_type:
model_class_type = substring
break

model_class_name = MODEL_TYPE_TO_MODEL_CLASS_TYPE[model_class_type]
return model_class_name
6 changes: 6 additions & 0 deletions QEfficient/transformers/models/llama_swiftkv/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# -----------------------------------------------------------------------------
#
# Copyright (c) 2025 Qualcomm Innovation Center, Inc. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# -----------------------------------------------------------------------------
Loading
Loading