-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
score: experimental extension novelty in sorting #665
Conversation
Right now we boost a file extension that hasn't been seen to the 3rd position. This is gated by an environment variable. I want to explore if there are ways we can turn on this behaviour with the query language. Test Plan: ZOEKT_NOVELTY=1 go run ./cmd/zoekt foo
@jtibshirani @stefanhengl after playing around with this a bunch, I'm really enjoying it. Keen to ship it in sourcegraph. What do you think? There is maybe one potential change I make before following up. When aggregating results in the frontend with streaming we may call this multiple times. I suppose we only want this behaviour for the very first call. |
@keegancsmith seems like a nice direction! Can you share some examples where it really helps, to help me get a feel for things too? I'm also curious -- if we had a great notion of "file importance", would this still be as helpful? In my work with keyword search, I've noticed the top results can be filled with build files or other noise, but we could try to address that directly. |
I don't have concrete examples of it boosting something I wanted, most of my testing the result I wanted was at the top. However, it feels good and that is what I am going on (sorry for being so non empirical). |
Stefan just reminded me of one real example we came across. We boosted a markdown file into the third spot which was related to the query and it was part of what we wanted to see. |
Sorry for the slow review on my end! In general it does feel important to balance relevance vs. diversity for broad queries. Another "diversity rule" that could be helpful: in the absence of file filters, at least one file in the top 3 should be a code file (not build, not docs). |
Diversity is the word I was looking for, feels like a much better descriptor than novelty. Nice idea, filed https://github.com/sourcegraph/sourcegraph/issues/57975 |
Right now we boost a file extension that hasn't been seen to the 3rd position. This is gated by an environment variable which defaults to on. I want to explore if there are ways we can turn on this behaviour with the query language.
Test Plan: go run ./cmd/zoekt foo