Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

removing of MC from TB in GBIF? #95

Open
myrmoteras opened this issue Jun 9, 2023 · 5 comments
Open

removing of MC from TB in GBIF? #95

myrmoteras opened this issue Jun 9, 2023 · 5 comments
Labels

Comments

@myrmoteras
Copy link
Contributor

myrmoteras commented Jun 9, 2023

@gsautter in this match, it get this return for the material citation key in GBIF:
image

is this a unique issue, or does this occur whenever we change a MC in a treatment?

Also in this match

@gsautter
Copy link

gsautter commented Jun 9, 2023

Well, it happens if the occurrenceID in the DwCA changes, which is composed of the treatment UUID and the materials citation UUID ... and both these UUIDs are bound to the positions of their start and end words, so if you change annotation boundaries, that does change the respective UUID ... the downside of well-defined UUIDs, with the advantage being reproducability, and thus duplicate prevention, which was a problem in the past, and to some degree still is with our XML-only documents.

However, looking at the version history of the underlying treatment, it doesn't look like any annotation boundaries changed at all, since external link write-back doesn't do that, only ever setting attributes:
image

It looks more like the occurrences got filtered from the DwCA as the result of some QC reasons: https://tb.plazi.org/GgServer/pdsStats/stats?outputFields=doc.docUuid+doc.name+doc.doi+doc.uploadUser+doc.uploadDateTime+doc.updateUser+doc.updateDateTime+docTransits.detailId+docTransits.detailLabel+docTransits.source+docTransits.dest+docTransits.result+docTransits.probCount&groupingFields=doc.docUuid+doc.name+doc.doi+doc.uploadUser+doc.uploadDateTime+doc.updateUser+doc.updateDateTime+docTransits.detailId+docTransits.detailLabel+docTransits.source+docTransits.dest+docTransits.result+docTransits.probCount&FP-doc.docUuid=FF9EFFD42F71FFCD8259E753C4769279&FP-docTransits.dest=%22DwCA%25%22&format=HTML

A look at https://tb.plazi.org/GgServer/xml/03A787AC2F64FFD882D1E3DDC0989A71 also confirms that the materials citation in question is there and is fine ... it only got filtered from the DwCA, which will be reverted soon as the materials citation issues listed in the error protocol are dealt with (fixed or marked as false positives) ... there's only 4 of them, so this shouldn't take long.

@myrmoteras
Copy link
Contributor Author

@flsimoes can you freeze x the issues and let me know.
I then go back to the respective matcit and see whether I can link them. I already decided they don't match...

@gsautter
Copy link

gsautter commented Jun 9, 2023

Turns out the underlying article is a Phytotaxa whose treatments have somewhat chaotic structure, and the materials citations in question were ones marked in treatment citations, not in regular "materials examined" sections ... need to properly QC those as well.

@flsimoes
Copy link

flsimoes commented Jun 9, 2023

@flsimoes can you freeze x the issues and let me know. I then go back to the respective matcit and see whether I can link them. I already decided they don't match...

Will work on it.
I'm guessing you mean "fix the issues"

@gsautter
Copy link

gsautter commented Jun 9, 2023

Turns out the underlying article is a Phytotaxa whose treatments have somewhat chaotic structure, and the materials citations in question were ones marked in treatment citations, not in regular "materials examined" sections ... need to properly QC those as well.

IMF fixed, the occurrences are visible in GBIF again, with their original keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants