Skip to content

The-kamisato/MatryoshkaKV-cache

Repository files navigation

MatryoshkaKV cache: Adaptive KV compression via Trainable orthogonal projection

architecture

Code for MAtryoshkaKV-cache.

This project delivered LLaMA equipped with optimized orthogonal projections in modeling_pacllama_trial.py, and we conducted experiments by simply patching the base LLaMA implementation using this Python file.

Training

We first initialize our orthogonal projections by PCA(Principal Component Analysis) running cal_pcallama_init.py

During training, our patches are applied to LLaMA-Factory at:

  • LLaMA-Factory/src/llamafactory/model/custom_model/modeling_pcallama_trial.py

Furthermore, due to the use of a distillation objective, we deliver our custom trainer PcaLlamaDistillationTrainer and PcaLlamaTrainer at:

  • LLaMA-Factory/src/llamafactory/train/pt/trainer.py
  • LLaMA-Factory/src/llamafactory/train/sft/trainer.py

Our training scripts are under LLaMA-Factory/scripts.

And our dataset for continual pre-training is downloaded from RedPajama-Sample.

Evaluation

For evaluation, our patches are applied to opencompass at:

  • opencompass/opencompass/models/custom_model

Additionally, modifications are made for loading Hugging Face models in:

  • opencompass/opencompass/models/huggingface_above_v4_33.py

Performance

result table

Visualization

compression rate

TODO

In the future, we will release the complete training process.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published