Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformer migration #334

Closed

Conversation

qcdipankar
Copy link
Contributor

Enabled Models for transformer==4.50.0

Models Enabled

1.GPT2
2.GPTJ
3.Phi
4.Phi3
5.Granite
6.Whisper

quic-amitraj and others added 15 commits April 1, 2025 05:51
compilation fix and enabled mxfp6 for vision encoder

---------

Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Dipankar Sarkar <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Dipankar Sarkar <[email protected]>
Removing onnx_defer_loading flag which was originally removed in
_[Removed onnx_defer_loading from Immutable Convertor Args. PR: 230]_
but got added back later in _[Mllama(single + dual) + InternVL(single) +
Llava (single) PR: 267]_ maybe becausing of rebasing.

Signed-off-by: Shubham Agrawal <[email protected]>
Signed-off-by: Dipankar Sarkar <[email protected]>
This will create a config JSON file, which contains all the details
about compilation and SDK versions.

Currently, this code is added in the code block of
QEFFAutoModelForCausalLM.compile.

The config would look like below:

```
{
    "huggingface_config": {
        "vocab_size": 50257,
        "n_positions": 1024,
        "n_embd": 768,
        "n_layer": 12,
        "n_head": 12,
        "n_inner": null,
        "activation_function": "gelu_new",
        "resid_pdrop": 0.1,
        "embd_pdrop": 0.1,
        "attn_pdrop": 0.1,
        "layer_norm_epsilon": 1e-05,
        "initializer_range": 0.02,
        "summary_type": "cls_index",
        "summary_use_proj": true,
        "summary_activation": null,
        "summary_first_dropout": 0.1,
        "summary_proj_to_labels": true,
        "scale_attn_weights": true,
        "use_cache": true,
        "scale_attn_by_inverse_layer_idx": false,
        "reorder_and_upcast_attn": false,
        "bos_token_id": 50256,
        "eos_token_id": 50256,
        "return_dict": true,
        "output_hidden_states": false,
        "output_attentions": false,
        "torchscript": false,
        "torch_dtype": null,
        "use_bfloat16": false,
        "tf_legacy_loss": false,
        "pruned_heads": {},
        "tie_word_embeddings": true,
        "chunk_size_feed_forward": 0,
        "is_encoder_decoder": false,
        "is_decoder": false,
        "cross_attention_hidden_size": null,
        "add_cross_attention": false,
        "tie_encoder_decoder": false,
        "max_length": 20,
        "min_length": 0,
        "do_sample": false,
        "early_stopping": false,
        "num_beams": 1,
        "num_beam_groups": 1,
        "diversity_penalty": 0.0,
        "temperature": 1.0,
        "top_k": 50,
        "top_p": 1.0,
        "typical_p": 1.0,
        "repetition_penalty": 1.0,
        "length_penalty": 1.0,
        "no_repeat_ngram_size": 0,
        "encoder_no_repeat_ngram_size": 0,
        "bad_words_ids": null,
        "num_return_sequences": 1,
        "output_scores": false,
        "return_dict_in_generate": false,
        "forced_bos_token_id": null,
        "forced_eos_token_id": null,
        "remove_invalid_values": false,
        "exponential_decay_length_penalty": null,
        "suppress_tokens": null,
        "begin_suppress_tokens": null,
        "architectures": [
            "GPT2LMHeadModel"
        ],
        "finetuning_task": null,
        "id2label": {
            "0": "LABEL_0",
            "1": "LABEL_1"
        },
        "label2id": {
            "LABEL_0": 0,
            "LABEL_1": 1
        },
        "tokenizer_class": null,
        "prefix": null,
        "pad_token_id": null,
        "sep_token_id": null,
        "decoder_start_token_id": null,
        "task_specific_params": {
            "text-generation": {
                "do_sample": true,
                "max_length": 50
            }
        },
        "problem_type": null,
        "_name_or_path": "gpt2",
        "_commit_hash": "607a30d783dfa663caf39e06633721c8d4cfcd7e",
        "_attn_implementation_internal": "eager",
        "transformers_version": null,
        "model_type": "gpt2",
        "n_ctx": 1024
    },
    "qpc_config": {
        "QEff_config": {
            "pytorch_transforms": [
                "AwqToMatmulNbitsTransform",
                "GPTQToMatmulNbitsTransform",
                "CustomOpsTransform",
                "KVCacheTransform"
            ],
            "onnx_transforms": [
                "FP16ClipTransform",
                "SplitTensorsTransform"
            ],
            "onnx_path": "/root/.cache/qeff_models/GPT2LMHeadModel-36f0eca92731bb47/GPT2LMHeadModel.onnx"
        },
        "aic_compiler_config": {
            "apps_sdk_version": "1.20.0",
            "compile_dir": "/root/.cache/qeff_models/GPT2LMHeadModel-36f0eca92731bb47",
            "specializtions_file_path": "/root/.cache/qeff_models/GPT2LMHeadModel-36f0eca92731bb47/specializations.json",
            "prefill_seq_len": 32,
            "ctx_len": 128,
            "batch_size": 1,
            "full_batch_size": null,
            "num_devices": 1,
            "num_cores": 16,
            "mxfp6_matmul": false,
            "mxint8_kv_cache": false,
            "num_speculative_tokens": null
        },
        "qnn_config": {
            "enable_qnn": true,
            "qnn_config_path": "QEfficient/compile/qnn_config.json",
            "product": "QAIRT",
            "os": {
                "Ubuntu": 22.04,
                "Windows": 11
            },
            "sdk_flavor": [
                "aic"
            ],
            "version": "2.31.0",
            "build_id": "250109072054_3882",
            "qnn_backend_api_version": "2.18.0",
            "tensorflow": "2.10.1",
            "tflite": "2.3.0",
            "torch": "1.13.1",
            "onnx": "1.16.1",
            "onnxruntime": "1.17.1",
            "onnxsimplifier": "0.4.36",
            "android-ndk": "r26c",
            "platform": "AIC.1.20.0.14"
        }
    }
}
```

Note: The code structure may change.

---------

Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Dipankar Sarkar <[email protected]>
… validation page (quic#303)

Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Dipankar Sarkar <[email protected]>
This is just small fixes done for printing the
`QEFFAutoModelForCausalLM`'s instance by changing the `__repr__(self)`
method.

Signed-off-by: Abukhoyer Shaik <[email protected]>
Signed-off-by: Dipankar Sarkar <[email protected]>
… models (quic#286)

Minor fixes to generate and compile to be more consistent with how other
models are called.

---------

Signed-off-by: Kushal Dulla <[email protected]>
Signed-off-by: Dipankar Sarkar <[email protected]>
…re computed (quic#233)

1) Adding the support to resume the fine tuning using checkpoints from a
prev run which would have stopped in between.
2) Checkpoints, both intermediate and for complete epoch, will get saved
for each epoch through these changes.
3) There's no necessity to pass tokenizer_name if a model_name is
passed. It will take the same name as model_name by default.
If a different tokenizer_name is required than the model_name, then it
can be passed separately as an argument in the command.

---------

Signed-off-by: Swati Allabadi <[email protected]>
Co-authored-by: Swati Allabadi <[email protected]>
Signed-off-by: Dipankar Sarkar <[email protected]>
BUGFIX: added patch for InternVL to have vit_embeds 0th dim as dynamic
based on num_patches.

Signed-off-by: quic-dhirajku <[email protected]>
Signed-off-by: Dipankar Sarkar <[email protected]>
made few changes in the modeling files of both models for this method to
now work appropriately.

Signed-off-by: quic-dhirajku <[email protected]>
Signed-off-by: Dipankar Sarkar <[email protected]>
1. In case of finetuning on qaic, torch_qaic gradScaler will be used
2.  Moving back to lora_dropout = 0.05 on ML Framework team's ask.

Signed-off-by: Swati Allabadi <[email protected]>
Co-authored-by: Swati Allabadi <[email protected]>
Signed-off-by: Dipankar Sarkar <[email protected]>
Absent of customrmsnorm was causing GraniteCausalLM to fail in aic with
full model in 4.46.3

Addition of CustomRMSNormAIC fixes the issue

---------

Signed-off-by: Dipankar Sarkar <[email protected]>
Co-authored-by: Dipankar Sarkar <[email protected]>
Signed-off-by: Dipankar Sarkar <[email protected]>
Absent of customrmsnorm was causing GraniteCausalLM to fail in aic with
full model in 4.46.3

Addition of CustomRMSNormAIC fixes the issue

Signed-off-by: Dipankar Sarkar <[email protected]>
1. Changes in libraries used during Context binary generation.
2. Changed convertor spelling to converter to align with qairt-converter
string.

---------

Signed-off-by: Shubham Agrawal <[email protected]>
Signed-off-by: Dipankar Sarkar <[email protected]>
@quic-amitraj
Copy link
Contributor

Please Rebase

@qcdipankar qcdipankar closed this Apr 1, 2025
@qcdipankar qcdipankar force-pushed the transformer_migration branch from 917a501 to fc89e8b Compare April 1, 2025 17:37
@qcdipankar qcdipankar reopened this Apr 1, 2025
@quic-amitraj quic-amitraj marked this pull request as draft April 2, 2025 08:45
@qcdipankar qcdipankar force-pushed the transformer_migration branch 5 times, most recently from e7f885e to 99a2063 Compare April 3, 2025 03:08
@qcdipankar qcdipankar force-pushed the transformer_migration branch 3 times, most recently from 6eff2ef to e4b1cb7 Compare April 3, 2025 04:15
@qcdipankar qcdipankar closed this Apr 3, 2025
@qcdipankar qcdipankar reopened this Apr 3, 2025
@qcdipankar qcdipankar closed this Apr 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants