Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike: Investigate why University of Virginia's Dataverse installation (Libra Data) can't harvest a set from Harvard Dataverse Repository #206

Closed
jggautier opened this issue Dec 8, 2022 · 15 comments

Comments

@jggautier
Copy link
Collaborator

jggautier commented Dec 8, 2022

Sherry Lake emailed Dataverse support that the University of Virginia repository she manages, Libra Data at https://dataverse.lib.virginia.edu, can't harvest a set from the Harvard Dataverse Repository. When Sherry tries, using the dataverse_json and the ddi metadata formats, no dataset records are harvested into Libra Data.

Libra Data is using Dataverse v5.11.1. Harvard's repository is using 5.12.

The set from Harvard's repository is called UVA_Authored_Datasets.

In the email thread at https://help.hmdc.harvard.edu/Ticket/Display.html?id=331127, Sherry attached a log with more technical information about the harvesting attempt.

Lastly, when I tried to harvest the set from Harvard's repository into Demo Dataverse, some records are harvested but some failed to be harvested. I'm not sure if it's helpful for this case to know why Demo Dataverse is able to harvest some records but Libra Data can't harvest any.

Definition of done:

  • We know why the Libra Data repository is unable to harvest the set from the Harvard Dataverse and discuss next steps (e.g. suggesting configuration options for Libra Data, recording that this issue will be fixed when work being tracked in a related GitHub issue is done, opening another GitHub issue to describe a new bug or technical limitation)
@jggautier jggautier changed the title Spike: Investigate why UvA's Dataverse can't harvest set from Harvard Dataverse Repository Spike: Investigate why University of Virginia's Dataverse installation can't harvest a set from Harvard Dataverse Repository Dec 8, 2022
@jggautier jggautier changed the title Spike: Investigate why University of Virginia's Dataverse installation can't harvest a set from Harvard Dataverse Repository Spike: Investigate why University of Virginia's Dataverse installation (Libra Data) can't harvest a set from Harvard Dataverse Repository Dec 8, 2022
@shlake
Copy link

shlake commented Jan 10, 2023

I re-added Harvard as a Harvesting Client and it seemed to work (at least the whole thing did not fail). Error with the client I set up years ago just "FAILED"

SUCCESS; 159 harvested, 0 deleted, 1 failed.

@jggautier so I think this issue can be closed.

Solution was to create a new client on the dashboard.

@jggautier
Copy link
Collaborator Author

jggautier commented Jan 10, 2023

Awesome! Glad the harvesting works now.

Is it helpful to learn why the old client wasn't working, but the new client is? Is that even possible at this point?

Should recreating a failing set be the kind of troubleshooting method we recommend, the kind of troubleshooting where we don't really know why it works (like turning something off and turning it on again)?

@jggautier
Copy link
Collaborator Author

In a meeting this morning (IQSS/dataverse-pm#24) @landreev said that yes, installation admins should consider recreating a failing set when troubleshooting harvesting issues like these.

I'll close this issue

@shlake
Copy link

shlake commented Jan 19, 2023

@jggautier leaving a note here that my harvest (with the newly created set - on January 10th - which was a success on that day) failed today, Jan. 19th when I ran it. I had not set up a scheduled harvest. Today was the first time since then that I ran it; I manually ran it today.

This was not due to a recent upgrade on UVa's Dataverse repo

So how often am I going to have to re-create a new harvest set?

Running the client created on Jan 10, immediately failed... there was no "in progress"

Running the client I just created (Jan 19) with all the same configurations as before, worked, showed "in progress" until it was completed:

Screen Shot 2023-01-19 at 3 13 49 PM

@jggautier
Copy link
Collaborator Author

jggautier commented Jan 19, 2023

Sorry to hear, @shlake, and thanks for the update and the question. Ideally you'd have to recreate the client just once! Hopefully hearing that the re-created harvesting client fails when trying to harvest, after the first run was "successful", helps with troubleshooting.

Reopening this issue so it's easier to find.

@jggautier jggautier reopened this Jan 19, 2023
@landreev
Copy link
Collaborator

This is alarming. I do recommend to remove and recreate a client that may have sat around for a long time, when the server on the other end may have been upgraded a few times since the last successful harvest, etc. - as a one off remedy that may fix it. But no, it should not be expected to have to rebuild it from scratch every week!
Could you please send us/attach the log from that last failure. It should be in your logs directory as something like harvest_UVA-Harvard2_2023-01-19T.....

@landreev
Copy link
Collaborator

(this may or may not need a bug fix issue in the main dev. project - but yes, let's look at the log first)

@shlake
Copy link

shlake commented Jan 19, 2023

@landreev here is a link where I put 3 harvest log files - https://virginia.box.com/s/osi8ujbgcplxxlkehhmkd2jacxh5zu3u

  • one is the successful harvest from Jan10
  • one is the unsuccessful harvest today (Jan 19), which was using the same harvest client set up on Jan 10th
  • third file is the successful harvest from the new client set up today (Jan 19).

thanks for looking into this.

@landreev
Copy link
Collaborator

Thank you @shlake.
This is not looking awesome... We may have broken serving incremental harvests in 5.12.1, at least for pre-5.12.1 clients. i.e., "give me everything in the set xyz" works, but "give me what's new in the set xyz since mm-dd-yyyy" is broken. And by "we" I mean I may have.
I'll open a bugfix issue etc.

@shlake
Copy link

shlake commented Jan 19, 2023

@landreev just an FYI.... Harvard is still at 5.12 (not 5.12.1), right?

@landreev
Copy link
Collaborator

Harvard is in fact running 5.12.1 in all but name. (it's running a custom, and colorfully-named emergency build that was later released as 5.12.1).

@landreev
Copy link
Collaborator

I produced a (very simple) fix for this. We'll just need to figure out the logistics of applying it to our prod. I will keep you posted (I need to post about this in the Google group too).

@landreev
Copy link
Collaborator

@shlake Could you please check and confirm that you can harvest from us again, now that we are running 5.13 (can harvest from us more than once, that is)? - Thank you!

@shlake
Copy link

shlake commented Feb 24, 2023

@landreev Yes, it worked. The harvest found 1 new record and no errors. 👍

@landreev
Copy link
Collaborator

OK, thanks, that's great to hear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants