Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T474 improve date created search #536

Merged
merged 42 commits into from
Apr 15, 2024
Merged

Conversation

alepbloyd
Copy link
Contributor

Will close #474.

This PR changes:

  • Indexing behavior for works to add a date_created_isim solr field. The GwIndexer (in app/indexers) inherits default indexer behavior from Hyrax::WorkIndexer, and the specific indexers (gw_etd_indexer, gw_journal_issue_indexer and gw_work_indexer) inherit from GwIndexer, as we have all of the same indexing behavior for these. If in the future we need these types of works to be indexed differently, we can do that in each of these indexers individually.

    • The date_created_isim field is generated with adding behavior to generate_solr_document in the GwIndexer. Uses the Fedora object date_created field and parses it into a four digit integer (year) format. Per previous discussions, if there are multiple values in the date_created field, this chooses the earliest valid option.
  • Installs and configures blacklight_range_limit gem to add a "Date Created" field with a slider for selecting ranges. This can be styled further in the future.

  • Adds GEONAMES_USERNAME to the .env configurations and uncomments it in the hyrax.rb initializer. When I was reindexing everything, there was one work in particular that was causing an error due to something with the location field. Error message mentioned "geonames", found this line and uncommented it, made a Geonames account, and inserted my username (no keys/auth needed) and that seemed to fix it on future reindexing attempts?

  • Commented out the sort_catalog_spec tests as they're super flaky and I need to take another pass at them, but they're just messing up CI/CD workflows at the moment.

To test:

  • Run rspec tests - added some tests under spec/features/gw_indexer_spec.rb.
  • If working with existing data, run a reindex job. I did this through opening a rails console and running ActiveFedora::Base.reindex_everything(batch_size: 5) - the default batch size of 50 was causing my dev server to crash, but your mileage may vary.
    • Navigate to the /catalog and verify that there is a "Date Created" facet with a slider bar, like screenshot.
  • Manually deposit a new work with a valid year value in "Date Created" through the UI, and verify it is findable by year in the slider on the catalog.
  • Manually deposit a new work with an INVALID or blank year value in "Date Created" through the UI and ensure it deposits without error. The "Unknown" number beneath the date created slider should increase by one.
image

@alepbloyd alepbloyd requested review from kerchner and dolsysmith April 3, 2024 15:24
Copy link
Contributor

@dolsysmith dolsysmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior in the UI looks great and is as described in the PR. Very nice!

I did pick up one error in the logs, which now seems to occur when doing a search with no date limit set:

F, [2024-04-04T16:42:13.709056 #749] FATAL -- : [ab5c4867-e814-4542-a662-cc52a92380e8]   
F, [2024-04-04T16:42:13.709131 #749] FATAL -- : [ab5c4867-e814-4542-a662-cc52a92380e8] NoMethodError (undefined method `[]' for false:FalseClass):
F, [2024-04-04T16:42:13.709175 #749] FATAL -- : [ab5c4867-e814-4542-a662-cc52a92380e8]   
F, [2024-04-04T16:42:13.709209 #749] FATAL -- : [ab5c4867-e814-4542-a662-cc52a92380e8] app/services/hyrax/search_service_decorator.rb:13:in `search_results'

It doesn't seem to affect anything in the UI, but maybe worth a look?

Also, wondering if it would be a lot of effort to make the results sortable by date created, once they've been filtered by the facet? Maybe for a future enhancement, if it looks like a lot of work.

@alepbloyd
Copy link
Contributor Author

@dolsysmith - Thanks for taking a look! I'll take a look at that error from the logs and see if I can find what's happening there.

Re: sorting - I know that that's something doable via CatalogController through config.add_sort_field.... Might just be a matter of adding two lines, so I can try that out while I'm looking at that error. If it's not quite so simple, my vote is to bump that to another issue.

@alepbloyd
Copy link
Contributor Author

@dolsysmith - Can you explain what this file does? https://github.com/gwu-libraries/scholarspace-hyrax/blob/a2d57cfb4a552131c183150016b043cf8be4f38b/app/services/hyrax/search_service_decorator.rb

I ask because that's where the error seems to be coming from, and I tried just commenting out the entire file for an experiment, and everything still seems to work (including sorting collections) - and that error message seems to not come up.

@dolsysmith
Copy link
Contributor

dolsysmith commented Apr 5, 2024

If the works on a collections page (i.e., a view of the works in a single collection) are still sortable without it, then I guess we don't need it. I can take a look on Monday, too.

The behavior it was intended to fix was that when viewing the works in a collection, changing the sort parameters had no effect.

But I didn't write this patch, so I can't with confidence attest to its logic...

@dolsysmith
Copy link
Contributor

@alepbloyd

Without the SearchServiceDecorator, I'm seeing a different error whenever I remove the date range facet from the search:

F, [2024-04-09T19:37:25.419452 #2030] FATAL -- : [09b15375-1873-4da2-82cc-81f939b4d7f7] blacklight_range_limit (7.0.1) lib/blacklight_range_limit/segment_calculation.rb:21:in `add_range_segments_to_solr!'
[09b15375-1873-4da2-82cc-81f939b4d7f7] blacklight_range_limit (7.0.1) lib/blacklight_range_limit/range_limit_builder.rb:74:in `fetch_specific_range_limit'
[09b15375-1873-4da2-82cc-81f939b4d7f7] blacklight (6.25.0) lib/blacklight/search_builder.rb:147:in `block (2 levels) in processed_parameters'
[09b15375-1873-4da2-82cc-81f939b4d7f7] blacklight (6.25.0) lib/blacklight/search_builder.rb:146:in `each'
[09b15375-1873-4da2-82cc-81f939b4d7f7] blacklight (6.25.0) lib/blacklight/search_builder.rb:146:in `block in processed_parameters'
[09b15375-1873-4da2-82cc-81f939b4d7f7] blacklight (6.25.0) lib/blacklight/search_builder.rb:145:in `tap'
[09b15375-1873-4da2-82cc-81f939b4d7f7] blacklight (6.25.0) lib/blacklight/search_builder.rb:145:in `processed_parameters'

Are you seeing the same?

@alepbloyd
Copy link
Contributor Author

@dolsysmith - interesting! I don't recall seeing that, but am going to look into this more today and will keep you updated.

@dolsysmith
Copy link
Contributor

dolsysmith commented Apr 12, 2024

From the development logs, I think I've deduced that when running a search without a range limit set (or when removing range limit), a second query is run (perhaps to populate the range limit in the facets?), which looks something like this:

/catalog/range_limit?commit=Limit&cq=&locale=en&q=&range_end=2002&range_field=date_created_isim&range_start=1936&search_field=all_fields&sort=&utf8=%E2%9C%93.

It's this second query that seems to trigger the error. When range limits are set, a single query seems to be run, and no error is triggered. Not sure what this means, but just sharing the clues as I find them...

P.S. The range-limit controller code seems to be here.

@dolsysmith
Copy link
Contributor

dolsysmith commented Apr 12, 2024

@alepbloyd Still working on this, but I think the problem may be that the fetch_specific_range_limit method here may be getting called when the blacklight_params object is empty.

Update: Noting the comment on the latter method:

Another processing method, this one is NOT included in default processing chain,
it is specifically swapped in instead of add_range_limit_params for
certain ajax requests that only want to fetch range limit segments for
ONE field.

Maybe the problem is that this method is getting triggered by something in a view? I'm suspecting now that this method shouldn't be getting called, at least not as it is, though I'm still piecing together where that call originates from.

Another update

The error is triggered by a specific route, http://localhost:3000/catalog/range_limit, which is used by the JS in from the gem to add the values to the widget (I'm guessing). When running a search with range limits set, those are passed as parameters to the regular catalog route.

@dolsysmith
Copy link
Contributor

I am now seeing that the search builder that the ranged search uses is an instance of Hyrax::CatalogSearchBuilder. Looking at the code for that class, I see that the recommended approach, when using Blacklight mixins, is to construct our app's search builder from the Hyrax class, not the Blacklight class. Perhaps that's why the blacklight_params object is currently empty when the range_limit route is called?

I'm wrapping up for the day but can try this approach on Monday.

@dolsysmith
Copy link
Contributor

Unfortunately, changing our search builder implementation to match the documentation more closely had no appreciable effect.

@dolsysmith
Copy link
Contributor

@alepbloyd In addition to the patch, I removed a couple of lines from routes.rb that I believe to be extraneous (probably added by the generator).

Please test thoroughly, i.e., in all search/faceting scenarios (collections, dashboard, within a collection, etc.). I didn't see any more errors, but since I had to modify the Hyrax::SearchService behavior, that's a pretty global change.

@alepbloyd
Copy link
Contributor Author

@dolsysmith - thank you! Testing this now.

@alepbloyd
Copy link
Contributor Author

@dolsysmith - Tried searches in every context I could think of, and error seems to be gone! Search still works, new date indexing functionality is still working, and spec tests are still passing, so seems good by me. Thanks for the help in tracking that down.

Should we do a squash 'n' merge on this? I get tripped up on what our process should be after merging in changes from master into an active branch like this.

@dolsysmith
Copy link
Contributor

I think squashing and merging should be fine. It should be mostly up to date with master because of the previous merge(s).

@alepbloyd alepbloyd merged commit 0a71ba4 into master Apr 15, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve searching by Date Created field
4 participants