-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analyze the use of Solr for file searches in the context of the Files API extension and define an action plan for its use #9813
Comments
What about a Just Solr option as well? |
Since Solr does not index past versions, we would lose the file filtering feature for these versions. That's why I haven't considered the Just Solr option. But we can add such option if that downside is not considered relevant enough. @qqmyers |
I think Solr doesn't have docs for old versions just because we delete them when a new version is published. I think we also delete and recreate the latest published version's Solr doc whenever we reindex the draft version. With a redesign, I don't think we'd have to do that. That would make the Solr indexes bigger, but not deleting and rewriting so often could help Solr performance and not touching the db to render the dataset page could help there. Whether it's better than the other alternatives or not, I don't know, but I think it makes sense to consider it. |
@qqmyers I'm wondering why older versions are not indexed and if the reason is to optimize resources. I also don't know how big the indexes could get and if having big ones could be problematic. Maybe someone who has been aware of the implementation in the early days can provide some information on this. @scolapasta, @pdurbin, @landreev. On the other hand, starting to keep all the old versions indexed would cause the new ones to be indexed while others are not (Those old versions prior to this hypothetical redesign). There may be solutions for this, but only to take it into account. |
The original purpose of search in the app was to allow for ad-hoc queries that were not viable as SQL queries. I was not around Dataverse as much when Solr was introduced, but my understanding is that an important feature it added was faceted search. Both functions are important conveniences but are also fault tolerant, that is to say if some data didn't get into the index for any number of reasons, it's not a disaster. My main point being that Solr can not be relied on as a source of truth as to what is in Dataverse. Postgres, as a relational database, is much more reliable as a data store with integrity built in. There are well thought out standards for mapping the data to the application objects in Dataverse. In short, I'm for using the search engine for what it is designed for, and continuing to rely on Postgres for driving the application |
Yes, it's mostly because indexing has always been slow: |
After gathering all the collected information and feedback and evaluating the different options in yesterday's frontend meeting, we have reached a consensus. Initially we are going to base all file tab searches and filters on database queries, not including Solr at the moment. As developed in the recent API extension PR: #9820. Filters and search will be available for all versions in the same way, homogenizing behavior, although losing the possibility of using the Solr grammar. This decision is made on the assumption that Solr may not be required in the context of files tab search, whose search facets are reduced compared to other in-application searches. Therefore, if we find evidence that the assumption is incorrect (potentially when users test the SPA), we will work on extending the search capabilities to support Solr. In order to keep track of these decisions and that this can be useful for future UX work and application evolution, a new section has been added in the frontend README, which will include functionality behavior changes like the one we are addressing. Pull request: IQSS/dataverse-frontend#166. At the moment, and in the absence of more sophisticated doc tools, we are using the README, although this type of information may be transferred to a more elaborated documentation in the future (As requested in IQSS/dataverse-frontend#26). |
Yes, IQSS/dataverse-frontend#166 seems like a good writeup of the decisions explained above (no Solr for file listing page on dataset page, at least for now). Closing. |
Overview of the Feature Request
For the analysis we must consider certain aspects mentioned during the frontend weekly meeting for devs:
* Current behavior (JSF - DatasetPage.java): Solr is accessed to obtain the ids that meet the search criteria and then the DB is accessed to obtain all the metadata files of the dataset to iterate over them to create a list of those that match the ids returned by Solr.
We should evaluate different implementation options:
The action plan should allow making a decision on how to extend the API for files when filtering by search text, which as an initial implementation (#9785) will be done by DB in all cases (No Solr).
What kind of user is the feature intended for?
Devs, UX
What inspired the request?
What existing behavior do you want changed?
None
Any brand new behavior do you want to add to Dataverse?
N/A
Any open or closed issues related to this feature request?
The text was updated successfully, but these errors were encountered: