feat: Added Hindi benchmark. #1882

SaileshP97 · 2025-01-27T13:01:38Z

Code Quality

Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding datasets checklist

Reason for dataset addition: ...

I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding a model checklist

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.

x-tabdeveloping

I've added a couple of questions, and things you should consider, but this is a good start :D

x-tabdeveloping · 2025-01-28T09:49:19Z

mteb/benchmarks/benchmarks.py

@@ -1276,3 +1276,45 @@ def load_results(
    year={2024}
 }""",
 )
+
+MTEB_INDIC = Benchmark(


Would this not overwrite the Hindic benchmark above? I suppose this should have a different variable name.

Yes, I didn't saw this. I will update it.

x-tabdeveloping · 2025-01-28T09:50:05Z

mteb/benchmarks/benchmarks.py

+    name="MTEB(Hindi)",
+    tasks=get_tasks(
+        languages=["hin"],
+        tasks=[


Are you not adding any novel tasks to the benchmark? How will this benchmark be different from selecting Hindi as the language on MTEB(Indic)?

I would also advise you to think about making the benchmark as zero-shot as possible. Meaning if many models train on a certain dataset, you should probably avoid adding it to the benchmark, as the performance might not be representative of the model's ability to generalize.
You can get these for a given task by saying:

from collections import defaultdict import mteb model_metas = mteb.get_model_metas() tasks = mteb.get_benchmark("MTEB(Hindi)").tasks task_names = [task.metadata.name for task in tasks] models_trained_on_tasks = defaultdict(list) for model_meta in model_metas: if (model_meta.training_datasets is not None): for training_dataset in model_meta.training_datasets: if training_dataset in task_names: models_trained_on_tasks[model_meta.name].append(training_dataset) print(models_trained_on_tasks) # And then this would print something like: { "Model1": ["task_1", "task_2", ...], ... }

Yes, I completely agree with you. I will remove those datasets. Thanks for pointing it out.

x-tabdeveloping · 2025-01-28T09:51:37Z

mteb/benchmarks/benchmarks.py

+            "WikipediaRerankingMultilingual",
+        ],
+        exclusive_language_filter=True,
+        eval_splits=["test", "validation", "dev"],


Are you sure you want to use all of these splits for all tasks? Perhaps narrowing it down might be beneficial for certain tasks, if you think model trainers might use the dev and validation splits for validating their models.

I thought each dataset has either test, validation or dev split.

Yes, and for each task you should select needed splits

x-tabdeveloping · 2025-01-28T09:58:55Z

mteb/benchmarks/benchmarks.py

+        eval_splits=["test", "validation", "dev"],
+    ),
+    description="The Hindi Leaderboard benchmark extends the MTEB framework by incorporating Hindi-specific datasets and tasks derived from existing MTEB data. It evaluates text embedding models on a variety of tasks, including text classification, semantic similarity, and information retrieval, with a focus on Hindi language performance. This benchmark aims to provide a standardized evaluation platform to advance research and innovation in Hindi NLP, leveraging pre-existing datasets to ensure consistency and comparability across models.",
+    reference=None,


Do you not have a technical report yet? How are you progressing with the paper?

Right now I don't have any technical report because we have not added any new model or dataset for any task. But we are planning to work on new model. So we decide to add about this benchmark in that paper.

So is it possible to add citation later?

If the benchmark is subject to change in the near future, I'm wondering if it's worth it to merge it now. In any case I would add a beta marker in the name to suggest that it will change in the future. (name="MTEB(Hindi, beta)").
What do you think @KennethEnevoldsen ?

def. add the beta marker

… benchmark.

Samoed · 2025-01-29T06:03:54Z

mteb/benchmarks/benchmarks.py

+        exclusive_language_filter=True,
+        eval_splits=["test"],
+    ) + get_tasks(tasks=["HindiDiscourseClassification",
+                        "SentimentAnalysisHindi"], eval_splits=["train"]),


Tasks should not use train splits for evaluation. They're automatically used during training for classification

When I checked the eval_splits for the task it only had 'train' in /tasks/Classification/hin/HindiDiscourseClassification.py.
So If I set eval_splits='test' it will still used train split? or validation is not possible on this task?

I think, yes train split is used, because it seems that this dataset have only one split

…they only have train split.

KennethEnevoldsen · 2025-02-14T14:48:35Z

@SaileshP97 seems like this PR might have gotten a bit stale - are you still working on this?

Added Hindi benchmark.

f699aa5

Samoed requested a review from x-tabdeveloping January 27, 2025 14:29

Samoed changed the title ~~Added Hindi benchmark.~~ feat: Added Hindi benchmark. Jan 28, 2025

x-tabdeveloping reviewed Jan 28, 2025

View reviewed changes

SaileshP97 added 5 commits January 28, 2025 17:36

Merge branch 'main' into Hindi_leaderboard

3be942e

Removed some datasets from Hindi benchmark. Changed the name of Hindi…

3590870

… benchmark.

Uncommenting eval_splits.

8b4e737

Merge branch 'main' into Hindi_leaderboard

6a13806

Added different split to different task.

6790342

Samoed reviewed Jan 29, 2025

View reviewed changes

Samoed mentioned this pull request Jan 29, 2025

HindiDiscourseClassification, SentimentAnalysisHindi have only train splits #1897

Open

SaileshP97 added 2 commits January 29, 2025 18:53

Removing HindiDiscourseClassification and SentimentAnalysisHindi, as …

4ef293a

…they only have train split.

Merge branch 'main' into Hindi_leaderboard

6a10f8b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Added Hindi benchmark. #1882

feat: Added Hindi benchmark. #1882

SaileshP97 commented Jan 27, 2025 •

edited

Loading

x-tabdeveloping left a comment

x-tabdeveloping Jan 28, 2025

SaileshP97 Jan 28, 2025

x-tabdeveloping Jan 28, 2025

x-tabdeveloping Jan 28, 2025

SaileshP97 Jan 28, 2025

x-tabdeveloping Jan 28, 2025

SaileshP97 Jan 28, 2025

Samoed Jan 28, 2025

x-tabdeveloping Jan 28, 2025

SaileshP97 Jan 28, 2025

x-tabdeveloping Jan 28, 2025

KennethEnevoldsen Feb 14, 2025

Samoed Jan 29, 2025

SaileshP97 Jan 29, 2025

Samoed Jan 29, 2025

KennethEnevoldsen commented Feb 14, 2025

feat: Added Hindi benchmark. #1882

Are you sure you want to change the base?

feat: Added Hindi benchmark. #1882

Conversation

SaileshP97 commented Jan 27, 2025 • edited Loading

Code Quality

Documentation

Testing

Adding datasets checklist

Adding a model checklist

x-tabdeveloping left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KennethEnevoldsen commented Feb 14, 2025

SaileshP97 commented Jan 27, 2025 •

edited

Loading