Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the arena a competitive arXiv/Wikipedia/StackExchange search engine #44

Open
4 tasks
Muennighoff opened this issue Sep 16, 2024 · 1 comment
Open
4 tasks

Comments

@Muennighoff
Copy link
Contributor

Muennighoff commented Sep 16, 2024

The problem is that we're not getting enough votes. People use LMSys ChatbotArena, because it is pretty useful in itself, e.g. to play with models you cannot access & help you solve problems. For our arena this is more difficult as we have fixed corpora and people cannot easily add their own large corpus so it is more constrained.

@shaoyijia suggested incentivizing more people to vote by making it actually useful, e.g. maybe it could be a research/learning partner. For example, if we sell it as a better arxiv search than the native support of arxiv (most people think arxiv search support is bad), people may be curious to try and have more incentive to vote if we also show the top-k recommendation of the winner models you choose to help you know more. Currently, people may not have a lot of incentive to vote/play if they just see the below.
(paraphrasing Yijia's comments here)

image

Some concrete things we can do:

  • Add arXiv abs or pdf links to the search result so people can go read the paper
  • Show top-k results if user asks for it (maybe some way to expand results in the UI)
  • Maybe improve interface to ease search
  • Does someone have other concrete ideas?
@KennethEnevoldsen
Copy link

I think this is great! These changes could also work for wikipedia as well.

We could have it highlight the answer in the retrieved document either using a specific model (this does introduce a bias), alternatively we could also do it by embedding segments on the answer and see which segments are the best match. This is a non-trivial change though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants