Starbucks

Starbucks: Improved Training for 2D Matryoshka Embeddings

We propose Starbucks: a new 2D MRL fine-tuning and pre-training method.

Starbucks is composed of two key processes: the Starbucks Masked Autoencoding (SMAE) pretraining and the Starbucks Representation Learning (SRL) fine-tuning processes.

In Starbucks, the model loss is computed based on a limited target list of layer-dimension pairs, ranging from smaller to larger sizes, much like how the coffeehouses chain Starbucks offers coffee in different cup sizes, from Demi to Trenta.

General guidelines

Our codebase is built on top of torch and transformers.

We recommend using a conda environment to install the required dependencies. To install the required dependencies:

conda create -n starbucks python=3.10
conda activate starbucks

pip install torch
pip install transformers datasets peft
pip install deepspeed accelerate

For SMAE pre-training, see smae.

For SRL fine-tuning on retrieval task, see retrieval.

For SRL fine-tuning on STS task, see sts.

Model Checkpoints

We released our model checkpoints on Hugging Face Model Hub:

Pre-trained SMAE: bert-base-uncased-fineweb100bt-smae

Fine-tuned Starbucks_STS: Starbucks_STS

Fine-tuned Starbucks_Retrieval: Starbucks-msmarco

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
retrieval		retrieval
smae		smae
sts		sts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Starbucks.png		Starbucks.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Starbucks

General guidelines

Model Checkpoints

About

Releases

Packages

Contributors 2

Languages

License

ielab/Starbucks

Folders and files

Latest commit

History

Repository files navigation

Starbucks

General guidelines

Model Checkpoints

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages