Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce results on text classification benchmark. #1490

Open
tsWen0309 opened this issue Nov 23, 2024 · 15 comments
Open

Cannot reproduce results on text classification benchmark. #1490

tsWen0309 opened this issue Nov 23, 2024 · 15 comments

Comments

@tsWen0309
Copy link

I am trying to reproduce the performance of the model "jina_v3" https://huggingface.co/jinaai/jina-embeddings-v3 on text classificiaton benchmark.
And I am using the code below:

import mteb
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, AutoModel

model_name = "jinaai/jina-embeddings-v3"
model = SentenceTransformer('model_name',trust_remote_code=True)
tasks = mteb.get_tasks(tasks=['AmazonCounterfactualClassification',
'AmazonReviewsClassification',
"Banking77Classification",
'EmotionClassification',
'ImdbClassification',
'MTOPIntentClassification',
'ToxicConversationsClassification',
'TweetSentimentExtractionClassification'])
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model, eval_splits=["test"],output_folder=f"results/{model_name}")

The results seem to differ significantly from the results in https://huggingface.co/spaces/mteb/leaderboard .
Any suggestion?

@Samoed
Copy link
Collaborator

Samoed commented Nov 23, 2024

You should load model like this:

import mteb

model = mteb.load_model("jinaai/jina-embeddings-v3")
...

@tsWen0309
Copy link
Author

You should load model like this:

import mteb

model = mteb.load_model("jinaai/jina-embeddings-v3")
...

mteb has no attribute "load_model" ? I am using mteb==1.20.0

@Samoed
Copy link
Collaborator

Samoed commented Nov 23, 2024

Sorry, this should be

import mteb

model = mteb.get_model("jinaai/jina-embeddings-v3")
...

@tsWen0309
Copy link
Author

File "D:\code\mteb-main\mteb\models\overview.py", line 126, in get_model
model = meta.load_model(**kwargs)
File "D:\code\mteb-main\mteb\model_meta.py", line 120, in load_model
model: Encoder = loader(**kwargs) # type: ignore
File "D:\code\mteb-main\mteb\model_meta.py", line 37, in sentence_transformers_loader
return SentenceTransformerWrapper(model=model_name, revision=revision, **kwargs)
File "D:\code\mteb-main\mteb\models\sentence_transformer_wrapper.py", line 48, in init
model_prompts = self.validate_task_to_prompt_name(self.model.prompts)
File "D:\code\mteb-main\mteb\models\wrapper.py", line 81, in validate_task_to_prompt_name
task = mteb.get_task(task_name=task_name)
File "D:\code\mteb-main\mteb\overview.py", line 318, in get_task
raise KeyError(suggestion)

image
any solution?

@Samoed
Copy link
Collaborator

Samoed commented Nov 23, 2024

Can you provide code? I tried to run tasks with following code and everything was working

import mteb
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, AutoModel

model = mteb.get_model("jinaai/jina-embeddings-v3")
tasks = mteb.get_tasks(
    tasks=['AmazonCounterfactualClassification',
    'AmazonReviewsClassification',
    "Banking77Classification",
    'EmotionClassification',
    'ImdbClassification',
    'MTOPIntentClassification',
    'ToxicConversationsClassification',
    'TweetSentimentExtractionClassification'
    ]
)
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(
    model, 
    eval_splits=["test"],
    output_folder="results"
)

@tsWen0309
Copy link
Author

I am using the exact code of yours except I replace model=mteb.get_model("jinaai/jina-embeddings-v3") to model=mteb.get_model("jina_v3"), which is the local path of the download jina-embeddings-v3 model on https://huggingface.co/jinaai/jina-embeddings-v3. Could this be the problem?

@Samoed
Copy link
Collaborator

Samoed commented Nov 23, 2024

Yes, I think this is a problem

@tsWen0309
Copy link
Author

I successfully run the code, but the results still don't match. For example, on banking77 dataset, I got Acc 76.77 vs 84.08(reported)
image

@tsWen0309
Copy link
Author

Yes, I think this is a problem

I run the code on several text classification datasets. None of the results match the performance reported in the leaderboard. Neither significantly high nor low. Do you meet the same problem?

@Samoed
Copy link
Collaborator

Samoed commented Nov 24, 2024

@bwanglzu Do you have any ideas?

Results:

Task Leaderboard score Eval result
AmazonCounterfactualClassification 0.925440219900916 0.9559220389805099
TweetSentimentExtractionClassification 0.713978494623656 0.7420769666100736
Banking77Classification 0.8408116883116883 0.7678896103896105
ToxicConversationsClassification 0.912890625 0.912548828125

@tsWen0309
Copy link
Author

To updatae. I randomly selected some models to reproduce the performance. NV-embed-v2 failed. learning2_model succeed.

@KennethEnevoldsen
Copy link
Contributor

Hmm this seems odd.

I am using the exact code of yours except I replace model=mteb.get_model("jinaai/jina-embeddings-v3") to model=mteb.get_model("jina_v3"), which is the local path of the download jina-embeddings-v3 model on https://huggingface.co/jinaai/jina-embeddings-v3. Could this be the problem?

Just want to state that this is indeed an issue as it will call sentence-transformers to load the model instead of our implementation, which also included prompt-handling (see implementation below):

model_prompts={

A few points to ensure. Check that everything works:

  1. What version of MTEB is used?
  2. Does the revision IDs match?

@tsWen0309
Copy link
Author

Hmm this seems odd.

I am using the exact code of yours except I replace model=mteb.get_model("jinaai/jina-embeddings-v3") to model=mteb.get_model("jina_v3"), which is the local path of the download jina-embeddings-v3 model on https://huggingface.co/jinaai/jina-embeddings-v3. Could this be the problem?

Just want to state that this is indeed an issue as it will call sentence-transformers to load the model instead of our implementation, which also included prompt-handling (see implementation below):

model_prompts={

A few points to ensure. Check that everything works:

  1. What version of MTEB is used?
  2. Does the revision IDs match?

I am using MTEB==1.20.0, and the revision id of "jinaai/jina-embeddings-v3" model is 215a6e121fa0183376388ac6b1ae230326bfeaed

@bwanglzu
Copy link
Contributor

I'll take a look this morning

@KennethEnevoldsen
Copy link
Contributor

@bwanglzu did you have a chance to look at these?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants