Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate error when trying to create dataset links for Dean Karlan's datasets #274

Closed
jggautier opened this issue May 21, 2024 · 9 comments
Labels
bug Something isn't working Size: 10 A percentage of a sprint.

Comments

@jggautier
Copy link
Collaborator

jggautier commented May 21, 2024

I used the Dataverse API to try to add links of datasets on Harvard Dataverse authored by Dean Karlan into two collections and the API returned a 403 error for a number of them.

3 dataset links not added to deankarlan collection
In mid-April 2024, I used the Dataverse API to try to add 54 dataset links into the collection at https://dataverse.harvard.edu/dataverse/deankarlan. Links for 51 of those dataset were created.

I can't add links for three of those datasets. When I try, the API returns a 403 error. In the datasetlinkingdataverse table in the Harvard Dataverse database, I can see the IDs of all 54 datasets, including the three that don't appear as links in the UI. Maybe that's related to the 403 error?

The three datasets that I couldn't add as links into the collection are:

  • doi:10.7910/DVN/EJTYHC (link)
  • doi:10.7910/DVN/NHIXNT (link)
  • doi:10.7910/DVN/QT7IXR (link)

12 dataset links not added to DFEEP collection
In early June 2024, I used the Dataverse API to try to add 12 dataset links to the collection at https://dataverse.harvard.edu/dataverse/DFEEP. The API returned 403 errors and the links were not added.

Here are the DOIs for those 12 datasets:

  • doi:10.7910/DVN/EX9FU1 (link)
  • doi:10.7910/DVN/TUXWCQ (link)
  • doi:10.7910/DVN/QT7IXR (link)
  • doi:10.7910/DVN/BTZXQX (link)
  • doi:10.7910/DVN/A2KSIV (link)
  • doi:10.7910/DVN/RNGHDV (link)
  • doi:10.7910/DVN/MA31ZM (link)
  • doi:10.7910/DVN/L0I9HW (link)
  • doi:10.7910/DVN/5NM88Z (link)
  • doi:10.7910/DVN/XXT0GR (link)
  • doi:10.7910/DVN/ULI7IX (link)
  • doi:10.3886/ICPSR02526.v1 (link)

More context
In the email thread at https://help.hmdc.harvard.edu/Ticket/Display.html?id=359122, Dean Karlan and I discussed how to make sure that the datasets he's an author of are linked into his collection and linked into the DFEEP collection, including the 54 datasets published so far and any datasets published in the future.

I also used the Saved Search feature to add links of any datasets he's an author of that are published in the future (see #275 and #277).

@jggautier jggautier added the bug Something isn't working label May 21, 2024
@jggautier jggautier changed the title Investigate error when trying to create dataset links Investigate error when trying to create dataset links for Dean Karlan's datasets May 21, 2024
@jggautier
Copy link
Collaborator Author

I was able to use the Saved Search feature so that all Dean Karlan-authored datasets are added as links into his collection and the DFEEP collections. @sbarbosadataverse , @scolapasta and I wondered if Dataverse would then add links for the datasets I listed above. I checked today and the links have not been added.

@cmbz
Copy link
Collaborator

cmbz commented Jun 24, 2024

@scolapasta will investigate several cases to see if reindexing helps, then will perform additional troubleshooting if needed.

@jggautier
Copy link
Collaborator Author

jggautier commented Jun 24, 2024

As of this writing, links for all but one of the datasets that I listed in this issue's first comment are in the two collections. I'm not sure how or when this happened. @scolapasta wrote in a Slack message that it might have happened during some reindexing.

The unpublished dataset doi:10.7910/DVN/QT7IXR (link) is the only dataset that doesn't have a link in the deankarlan collection and in the DFEEP collection.

It's the only unpublished dataset that we needed to create links for, as of this writing. And I'm not able to create links for unpublished datasets when I try on Demo Dataverse. When I use the API to try, I get a 403 error. As far as I can tell, the User Guides, such as https://guides.dataverse.org/en/6.2/user/dataverse-management.html#dataset-linking, don't mention that links can't be created for unpublished datasets, but what's written on that User Guides page seems UI-focused, and the "Link Dataset" button appears only on published dataset pages.

@sbarbosadataverse sbarbosadataverse moved this to SPRINT- NEEDS SIZING in IQSS Dataverse Project Jun 26, 2024
@sbarbosadataverse sbarbosadataverse added the Size: 10 A percentage of a sprint. label Jun 26, 2024
@cmbz cmbz moved this from SPRINT- NEEDS SIZING to SPRINT READY in IQSS Dataverse Project Jul 1, 2024
@stevenwinship
Copy link
Contributor

stevenwinship commented Nov 4, 2024

@jggautier doi:10.7910/DVN/QT7IXR (link) Is missing required Metadata fields (Text description and Subject). This is preventing the Dataset from being Published as well as Linked.

You are correct that it does not need to be published to be linked, but it does need to be valid.

@stevenwinship
Copy link
Contributor

image

@jggautier
Copy link
Collaborator Author

Thanks for confirming @stevenwinship!

I actually wasn't sure if it needed to be published or not, and I'm not sure why I wasn't able to link an unpublished dataset when I tried on Demo Dataverse back in June. I'll try again today.

The person who opened the GitHub issue at IQSS/dataverse#10134 mentions an error message when they try to link an unpublished dataset. But that issue was written back in Nov. 2023 and maybe it's been fixed since then.

Not being able to link an unpublished dataset that is missing required fields sounds like a bug, right? Or maybe an oversight? I don't think it was intended that people wouldn't be able to create links of these sorts of datasets, and it isn't mentioned in the latest version of the guides.

@jggautier
Copy link
Collaborator Author

jggautier commented Nov 4, 2024

When I try on Demo Dataverse to create a link of an unpublished dataset into another collection, I get the same error message reported in IQSS/dataverse#10134:

Screenshot 2024-11-04 at 3 21 46 PM

That I'm given an error message like this makes me think that it's intended that unpublished datasets can't be linked. And if that's the case, it doesn't seem like a bug that we're not able to link an unpublished dataset that is missing required fields.

@stevenwinship, could you write about why a dataset doesn't need to be published to be linked? Are you seeing something in the code that indicates that we should be able to create links to unpublished datasets?

@jggautier
Copy link
Collaborator Author

jggautier commented Nov 5, 2024

Hey again. I thought I'd mention here that eventually I plan to propose that we improve that error message and what's in the API Guides.

That error message says something about harvested datasets. I've used that endpoint to help users create links of datasets that have been harvested, and I tested it again today in Harvard Dataverse to make sure it's still possible.

So at the least, that last part of the error message about harvested datasets should be removed.

But we also need to know whether or not we intended for the endpoint to let users create links to unpublished datasets. If it should be possible to create links of unpublished datasets, I think we'd just want to remove the message.

@jggautier
Copy link
Collaborator Author

jggautier commented Jan 6, 2025

Thinking about this some more, I think that it's better that the API endpoint for linking datasets doesn't let us link unpublished datasets, because the UI doesn't let us create links to unpublished datasets. It seems very likely that this was by design. And I suspect that @stevenwinship wrote that "it does not need to be published to be linked" because the requirement for the dataset to be published isn't documented anywhere, which I think is what tripped me up, too.

Since the one remaining dataset authored by Dean Karlan that I couldn't create a link to is unpublished, and I think creating links to unpublished datasets shouldn't be possible anyway, I'm going to close this GitHub issue. We've investigated ll errors when trying to create dataset links for all of Dean Karlan's datasets.

I opened a GitHub issue at IQSS/dataverse#11131 to suggest that the error message returned by the endpoint is edited so that it doesn't mention harvesting.

@github-project-automation github-project-automation bot moved this from SPRINT READY to Done 🧹 in IQSS Dataverse Project Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Size: 10 A percentage of a sprint.
Projects
Status: Done 🧹
Development

No branches or pull requests

4 participants