Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search improvements #242

Open
3 tasks
kosarko opened this issue Jun 23, 2015 · 4 comments
Open
3 tasks

search improvements #242

kosarko opened this issue Jun 23, 2015 · 4 comments

Comments

@kosarko
Copy link
Member

kosarko commented Jun 23, 2015

Following the discussion on lindat-tech...

We want the search to behave more predictably (more like well know search engines). OR as a default operator seems confusing.

Can we analyze what our users are searching for (sentences/keywords/?) and how often do they browse to further search pages/change the search string/visit some result/leave?

======
summary:

  • more predictable query (updated query boosting #647)
  • ?improve tokenization (WordNet and wordnet should produce same results)
  • ?other improvements to the query and results display see below
@kosarko kosarko added this to the good ideas for later milestone Jun 23, 2015
@stranak
Copy link
Member

stranak commented Jun 25, 2015

My reply to @amirkamran's suggestion by email.

On 25 Jun 2015, at 07:26, Amir Kamran [email protected] wrote:

You were right by default it was AND, and of course with AND you only get fewer results where all the terms in the query presents, thats why we switched to OR to increase the search hits.
I think what we can do to improve the results, with current settings, is to add the whole query in inverted commas for each search before sending it to Solr. e.g. if someone search for prague dependency
we can query for prague or dependency or “prague dependency”

Yes, that seems like the right direction. In general I think it makes sense to see results ordered as if the evaluation was:
"prague dependency" | (prague AND dependency) | prague | dependency

I.e. use the OR only to enrich query results, but consider those results worse than AND and consider AND worse than full string match.

And ideally, just like Google, I would separate the second half by clearly saying that "The following results only match part of your query", or something. Currently you get 145 results for prague treebank, it would make sense to me to split them to full and partial matches.

@kosarko
Copy link
Member Author

kosarko commented Oct 2, 2015

@vidiecan We should prepare some test set based on our data and (fine) tune the searching; I've noticed for example that search for WordNet gives different results than search for wordnet. The first one is split by WordDelimiterFilter into Word+Net and together with OR as a default operator, the resulting query is equivalent to word OR net. The second doesn't get split in such a way. I wouldn't consider this a bug but it's very unexpected.

@stranak
Copy link
Member

stranak commented Oct 2, 2015

WordNet gives different results than search for wordnet. The first one is split by WordDelimiterFilter into Word+Net and together with OR as a default operator, the resulting query is equivalent to word OR net. The second doesn't get split in such a way. I wouldn't consider this a bug

I agree, this looks like a bug to me to.

@amirkamran
Copy link

some of the things fixed in #632
The relevance works in the following way now:

"+(" + query + ")"
+ " OR title:(" + query + ")^2"
+ " OR dc.relation.replaces:[* TO *]^2"
+ " OR (dc.relation.replaces:[* TO *] AND -dc.relation.isreplacedby:[* TO *])^2";

Explanation:
The searched terms should exist in resulting documents, if the search terms are in the title boost by 2, if multiple versions of an item exists boost the latest version by 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants