Great work! Just a few questions regarding the next step #1

Randolph-zeng · 2023-04-19T05:01:57Z

Hi, I just read the paper and really admire the ambition of it. I just have a few quick questions regarding it:

In the abstract, the paper claims "it is still under-explored whether English-based foundation LLMs can perform similarly on multilingual tasks compared to English tasks with well-designed instruction tuning and how we can construct the corpora needed for the tuning." . However, I did not see the paper spend any effort to define the metrics to evaluate the models and conduct any ablation study to prove the effectiveness of the current data collection mechanism. I wonder are they on the agenda for the next step ?
There is a recent paper from Microsoft "AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models" that suggest using standard exam to evaluate the performance of LLM. Though they have also released some college entrance exam questions, they are far from exhaustive. Is open sourcing the raw exam materials for research community an option here ? Or would it be interesting for BAAI to hold a CLUE style LLM leaderboard consisting of exam questions to evaluate LLM performances?

shiyemin · 2023-04-19T09:48:56Z

Hi, I just read the paper and really admire the ambition of it. I just have a few quick questions regarding it:

In the abstract, the paper claims "it is still under-explored whether English-based foundation LLMs can perform similarly on multilingual tasks compared to English tasks with well-designed instruction tuning and how we can construct the corpora needed for the tuning." . However, I did not see the paper spend any effort to define the metrics to evaluate the models and conduct any ablation study to prove the effectiveness of the current data collection mechanism. I wonder are they on the agenda for the next step ?

There is a recent paper from Microsoft "AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models" that suggest using standard exam to evaluate the performance of LLM. Though they have also released some college entrance exam questions, they are far from exhaustive. Is open sourcing the raw exam materials for research community an option here ? Or would it be interesting for BAAI to hold a CLUE style LLM leaderboard consisting of exam questions to evaluate LLM performances?

Thanks for your attention and questions.

The evaluation of COIG will be released in our next version of paper. And we are also trying to provide some evaluation code so that everyone can use COIG easier.
We do not have a plan to release the raw exam materials for now. But we will consider this idea seriously. As for LLM leaderboard, I do not have an answer for it but i think it should be possible for BAAI to hold one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Great work! Just a few questions regarding the next step #1

Great work! Just a few questions regarding the next step #1

Randolph-zeng commented Apr 19, 2023

shiyemin commented Apr 19, 2023

Great work! Just a few questions regarding the next step #1

Great work! Just a few questions regarding the next step #1

Comments

Randolph-zeng commented Apr 19, 2023

shiyemin commented Apr 19, 2023