Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added evaluation script #14

Open
wants to merge 41 commits into
base: main
Choose a base branch
from
Open

feat: added evaluation script #14

wants to merge 41 commits into from

Conversation

Markus28
Copy link

No description provided.

guenthermi and others added 30 commits October 31, 2023 11:40
TASK_LIST = ["MIRACL", "GermanDPR", "PawsX", "GermanSTSBenchmark", "XMarket", "GerDaLIR", "WikiCLIR"]
MODELS = ['intfloat/multilingual-e5-base', 'intfloat/multilingual-e5-large', 'T-Systems-onsite/cross-en-de-roberta-sentence-transformer', 'sentence-transformers/distiluse-base-multilingual-cased-v2']
for model_name in MODELS:
model = SentenceTransformer(model_name, device='cuda')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This automatically limits the max_seq_length to 512. If this is desired, then I think the MTEB scores we publish should also result from the same max_seq_length of 512 and not 8k.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence-transformers/distiluse-base-multilingual-cased-v2 model actually uses a sequence length of 128. I'm not sure how large the positional embeddings even are for these models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants