MatryoshkaKV cache: Adaptive KV compression via Trainable orthogonal projection

This project delivered LLaMA equipped with optimized orthogonal projections in modeling_pacllama_trial.py, and we conducted experiments by simply patching the base LLaMA implementation using this Python file.

Training

We first initialize our orthogonal projections by PCA(Principal Component Analysis) running cal_pcallama_init.py

During training, our patches are applied to LLaMA-Factory at:

LLaMA-Factory/src/llamafactory/model/custom_model/modeling_pcallama_trial.py

Furthermore, due to the use of a distillation objective, we deliver our custom trainer PcaLlamaDistillationTrainer and PcaLlamaTrainer at:

LLaMA-Factory/src/llamafactory/train/pt/trainer.py
LLaMA-Factory/src/llamafactory/train/sft/trainer.py

Our training scripts are under LLaMA-Factory/scripts.

And our dataset for continual pre-training is downloaded from RedPajama-Sample.

Evaluation

For evaluation, our patches are applied to opencompass at:

opencompass/opencompass/models/custom_model

Additionally, modifications are made for loading Hugging Face models in:

opencompass/opencompass/models/huggingface_above_v4_33.py

Performance

Visualization

TODO

In the future, we will release the complete training process.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
LLaMA-Factory		LLaMA-Factory
figure		figure
opencompass		opencompass
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md
cal_adaptive_rate.py		cal_adaptive_rate.py
cal_pcallama_init.py		cal_pcallama_init.py
eigenvalues_decomposition.py		eigenvalues_decomposition.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MatryoshkaKV cache: Adaptive KV compression via Trainable orthogonal projection

Training

Evaluation

Performance

Visualization

TODO

About

Releases

Packages

Languages

The-kamisato/MatryoshkaKV-cache

Folders and files

Latest commit

History

Repository files navigation

MatryoshkaKV cache: Adaptive KV compression via Trainable orthogonal projection

Training

Evaluation

Performance

Visualization

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages