Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BAIs get incorrectly added to literature records from author.xml #460

Open
michamos opened this issue Mar 19, 2024 · 4 comments
Open

BAIs get incorrectly added to literature records from author.xml #460

michamos opened this issue Mar 19, 2024 · 4 comments
Labels
cold box When we are waiting for 3rd party or is not possible at the moment project: next type: bug Something isn't working

Comments

@michamos
Copy link
Collaborator

The code extracting identifiers from author.xml files blindly trusts all identifiers and tries to add them to the author (causing a validation error if it's an unknown id later down the line). This is fine for things like ORCID, but not for INSPIRE BAIs, as they have been removed from literature records and are now supposed to be generated dynamically from the linked author record during serialization: https://github.com/inspirehep/inspirehep/blob/4d514d4a046819ef984defc0435c413f3d90ce10/backend/inspirehep/records/marshmallow/literature/common/author.py#L62-L78.

The consequence is that we have hardcoded BAIs in literature records, which get out of sync with the linked author BAI in case the BAI has changed. Example: https://inspirehep.net/literature?sort=mostrecent&size=25&page=1&q=a%20Michele%20Selvaggi%20and%20a%20M.Selvaggi.1. These should all have BAI Michele.Selvaggi.1 instead of M.Selvaggi.1 but don't because of the hardcoding.

We should fix the bug and run the script in https://github.com/inspirehep/curation-scripts/blob/master/scripts/remove-bai-from-lit-authors/script.py again to fix existing records.

@michamos michamos added project: next type: bug Something isn't working labels Mar 19, 2024
@drjova
Copy link
Contributor

drjova commented Mar 25, 2024

@michamos I still don't understand what we should do with this, do we have to keep the INSPIRE BAIs instead of removing??

@michamos
Copy link
Collaborator Author

No, we should ignore the INSPIRE BAIs coming from author.xml

@drjova
Copy link
Contributor

drjova commented Mar 25, 2024

Thanks, could you please provide an author.xml with BAIs?

@michamos
Copy link
Collaborator Author

Hmmm, I can't find any examples. So I don't understand where the hardcoded BAIs are coming from. I assumed author.xml files but that doesn't seem to be the case. Let's put this on hold, I'll run the cleanup script again and we'll see if the issue happens again.

@drjova drjova added the cold box When we are waiting for 3rd party or is not possible at the moment label Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cold box When we are waiting for 3rd party or is not possible at the moment project: next type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants