Code for MAtryoshkaKV-cache.
This project delivered LLaMA equipped with optimized orthogonal projections in modeling_pacllama_trial.py
, and we conducted experiments by simply patching the base LLaMA implementation using this Python file.
We first initialize our orthogonal projections by PCA(Principal Component Analysis) running cal_pcallama_init.py
During training, our patches are applied to LLaMA-Factory at:
LLaMA-Factory/src/llamafactory/model/custom_model/modeling_pcallama_trial.py
Furthermore, due to the use of a distillation objective, we deliver our custom trainer PcaLlamaDistillationTrainer
and PcaLlamaTrainer
at:
LLaMA-Factory/src/llamafactory/train/pt/trainer.py
LLaMA-Factory/src/llamafactory/train/sft/trainer.py
Our training scripts are under LLaMA-Factory/scripts
.
And our dataset for continual pre-training is downloaded from RedPajama-Sample.
For evaluation, our patches are applied to opencompass at:
opencompass/opencompass/models/custom_model
Additionally, modifications are made for loading Hugging Face models in:
opencompass/opencompass/models/huggingface_above_v4_33.py
In the future, we will release the complete training process.