You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I just read the paper and really admire the ambition of it. I just have a few quick questions regarding it:
In the abstract, the paper claims "it is still under-explored whether English-based foundation LLMs can perform similarly on multilingual tasks compared to English tasks with well-designed instruction tuning and how we can construct the corpora needed for the tuning." . However, I did not see the paper spend any effort to define the metrics to evaluate the models and conduct any ablation study to prove the effectiveness of the current data collection mechanism. I wonder are they on the agenda for the next step ?
There is a recent paper from Microsoft "AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models" that suggest using standard exam to evaluate the performance of LLM. Though they have also released some college entrance exam questions, they are far from exhaustive. Is open sourcing the raw exam materials for research community an option here ? Or would it be interesting for BAAI to hold a CLUE style LLM leaderboard consisting of exam questions to evaluate LLM performances?
The text was updated successfully, but these errors were encountered:
Hi, I just read the paper and really admire the ambition of it. I just have a few quick questions regarding it:
In the abstract, the paper claims "it is still under-explored whether English-based foundation LLMs can perform similarly on multilingual tasks compared to English tasks with well-designed instruction tuning and how we can construct the corpora needed for the tuning." . However, I did not see the paper spend any effort to define the metrics to evaluate the models and conduct any ablation study to prove the effectiveness of the current data collection mechanism. I wonder are they on the agenda for the next step ?
There is a recent paper from Microsoft "AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models" that suggest using standard exam to evaluate the performance of LLM. Though they have also released some college entrance exam questions, they are far from exhaustive. Is open sourcing the raw exam materials for research community an option here ? Or would it be interesting for BAAI to hold a CLUE style LLM leaderboard consisting of exam questions to evaluate LLM performances?
Thanks for your attention and questions.
The evaluation of COIG will be released in our next version of paper. And we are also trying to provide some evaluation code so that everyone can use COIG easier.
We do not have a plan to release the raw exam materials for now. But we will consider this idea seriously. As for LLM leaderboard, I do not have an answer for it but i think it should be possible for BAAI to hold one.
Hi, I just read the paper and really admire the ambition of it. I just have a few quick questions regarding it:
The text was updated successfully, but these errors were encountered: