Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Great work! Just a few questions regarding the next step #1

Open
Randolph-zeng opened this issue Apr 19, 2023 · 1 comment
Open

Great work! Just a few questions regarding the next step #1

Randolph-zeng opened this issue Apr 19, 2023 · 1 comment

Comments

@Randolph-zeng
Copy link

Hi, I just read the paper and really admire the ambition of it. I just have a few quick questions regarding it:

  1. In the abstract, the paper claims "it is still under-explored whether English-based foundation LLMs can perform similarly on multilingual tasks compared to English tasks with well-designed instruction tuning and how we can construct the corpora needed for the tuning." . However, I did not see the paper spend any effort to define the metrics to evaluate the models and conduct any ablation study to prove the effectiveness of the current data collection mechanism. I wonder are they on the agenda for the next step ?
  2. There is a recent paper from Microsoft "AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models" that suggest using standard exam to evaluate the performance of LLM. Though they have also released some college entrance exam questions, they are far from exhaustive. Is open sourcing the raw exam materials for research community an option here ? Or would it be interesting for BAAI to hold a CLUE style LLM leaderboard consisting of exam questions to evaluate LLM performances?
@shiyemin
Copy link

Hi, I just read the paper and really admire the ambition of it. I just have a few quick questions regarding it:

  1. In the abstract, the paper claims "it is still under-explored whether English-based foundation LLMs can perform similarly on multilingual tasks compared to English tasks with well-designed instruction tuning and how we can construct the corpora needed for the tuning." . However, I did not see the paper spend any effort to define the metrics to evaluate the models and conduct any ablation study to prove the effectiveness of the current data collection mechanism. I wonder are they on the agenda for the next step ?
  2. There is a recent paper from Microsoft "AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models" that suggest using standard exam to evaluate the performance of LLM. Though they have also released some college entrance exam questions, they are far from exhaustive. Is open sourcing the raw exam materials for research community an option here ? Or would it be interesting for BAAI to hold a CLUE style LLM leaderboard consisting of exam questions to evaluate LLM performances?

Thanks for your attention and questions.

  1. The evaluation of COIG will be released in our next version of paper. And we are also trying to provide some evaluation code so that everyone can use COIG easier.
  2. We do not have a plan to release the raw exam materials for now. But we will consider this idea seriously. As for LLM leaderboard, I do not have an answer for it but i think it should be possible for BAAI to hold one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants