Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External Controlled Vocab: add missing db constraints, add cvoc transaction and try/catch #9984

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Oct 6, 2023

What this PR does / why we need it: Per the issue, it adds missing db constraints and improves the code for registering new external vocab values by isolating the update in a small transaction and stopping any attempt to add a duplicate value from killing the overall dataset edit/other transaction being done.

Which issue(s) this PR closes:

Closes #9983

Special notes for your reviewer: As of now, we don't know why the constraints are not there. There is some dv-tech channel slack discussion about it but no conclusions.
If the db has duplicate values now, the current flyway script would cause install failure. However, it appears very unlikely that either table would have duplicates now. The code around both tables checks for existing values before saving a new one, so only race conditions can create duplicates. Problems with the external vocab table immediately start causing edit failures, so in addition to a race condition being unlikely (parallel changes and ext. vocab enabled), the problem would be immediately noticed. For the oaiset table, it seems even more unlikely that duplicates would exist as the code there also appears to check for an existing value with the same spec and there isn't any parallel case where one would expect multiple oaisets to be being created at once. It isn't clear to me that we can automate removing duplicates if they do exist, especially for oaiset where the descriptions and other fields may differ. My guess is that handling this with a release note instruction might be enough as problems should be extremely rare.

Suggestions on how to test this:

Regression: verify that OAI sets can be created/used and that external vocab values can be added to datasets and their metadata appears in the exported metadata. IN either case it appears to be hard to test the improvement from having the constraints as one would have to run multiple adds of an OAISet/multiple edits to datasets with the same ext. vocab value terms to be able to trigger the problem. (Prior to the PR, parallel changes would add duplicates to the database which you could see there and, at least for ext. vocab, you'd see failures related to multi-value exceptions when trying to retrieve ext. vocab values. After the PR, you shouldn't ever have duplicate values. For OAISets you might get errors when trying to run parallel ops. For the external vocab case, you'd see a fine message in the log about an attempt to write a duplicate value but the overall edit/metadata change would succeed.)

One other thing to test - have a dataset that has uses a type with controlled vocabulary values and then change the definition of the type to not use CVV. Make sure that this does not adversely affect the functionality (display, indexing, etc) of the existing dataset. (adding this because if the change to line 287 of DatasetFieldType.java.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?: included

Additional documentation:

@coveralls
Copy link

coveralls commented Oct 6, 2023

Coverage Status

coverage: 20.267% (-0.008%) from 20.275%
when pulling f3fae4b on GlobalDataverseCommunityConsortium:IQSS/9983-missing_unique_constraints
into 4b5cbe8 on IQSS:develop.

janvanmansum added a commit to janvanmansum/dataverse that referenced this pull request Oct 10, 2023
…rolledVocabulary instead of looking up complete list of terms to see if it is empty
@qqmyers qqmyers changed the title add missing db constriants, add cvoc transaction and try/catch add missing db constraints, add cvoc transaction and try/catch Oct 11, 2023
@qqmyers qqmyers added the Size: 10 A percentage of a sprint. 7 hours. label Oct 11, 2023
janvanmansum added a commit to DANS-KNAW/dataverse that referenced this pull request Oct 12, 2023
* Applied IQSS#9984

* Merge IQSS#9955

---------

Co-authored-by: Jan van Mansum <[email protected]>
Co-authored-by: Jan van Mansum <[email protected]>
@qqmyers qqmyers added the GDCC: DANS related to GDCC work for DANS label Nov 6, 2023
@qqmyers qqmyers changed the title add missing db constraints, add cvoc transaction and try/catch External Controlled Vocab: add missing db constraints, add cvoc transaction and try/catch Jan 3, 2024
@PaulBoon
Copy link
Contributor

@qqmyers Can this be merged into the next release or is there something blocking it, besides the review?

@cmbz
Copy link

cmbz commented Jan 30, 2024

2024/01/30

Copy link
Contributor

@scolapasta scolapasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good, had a few basic questions.




TODO: Add note about reloading metadata blocks after upgrade.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a todo that we want before we qa / merge this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is just a note that the main release notes should include the reload block instructions for these blocks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you guys think we should clarify that whoever makes the release notes has to do this? I am not sure it is clear enough with the TODO: probably we could just mention it on Slack but we run the risk of someone missing the message 🤷🏼 @scolapasta @qqmyers

@@ -2,13 +2,13 @@
astrophysics Astronomy and Astrophysics Metadata
#datasetField name title description watermark fieldType displayOrder displayFormat advancedSearchField allowControlledVocabulary allowmultiples facetable displayoncreate required parent metadatablock_id
astroType Type The nature or genre of the content of the files in the dataset. text 0 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics
astroFacility Facility The observatory or facility where the data was obtained. text 1 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume these relate to the todo? What is the reason for changing these values? (I probably just missed it, now I'm just curious)

Copy link
Member Author

@qqmyers qqmyers Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the reason the blocks would need to be reloaded. I think these are inconsistencies - the block says they are controlled vocab fields but there are no vocab values for them in the block. (Which is noticeable once you start looking at the definition rather than for the existence of controlled vocab values to decide what input type to display.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense - I just wonder how why they were ever true in the first place?

@@ -284,7 +284,7 @@ public void setDisplayOnCreate(boolean displayOnCreate) {
}

public boolean isControlledVocabulary() {
return controlledVocabularyValues != null && !controlledVocabularyValues.isEmpty();
return allowControlledVocabulary;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change because we think the definition is enough (and more important than actual values). Not sure, but could we ever get a case where a field changes FROM allow controlled vocab to disallowing, but we still have old datasets with CVV?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janvanmansum may know more here but my understanding is that this is primarily being used to decide if you display the controlled vocab list for metadata input or not. So with the change if you set the field to not be cvv, it will now show as a text input, whereas before, if you ever had cvv values in the db, the field shows as cvv going forward. I don't think it affects things if there are specific field values in datasets that have to display, its just about the type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qqmyers that makes sense. My concern would be if there are other places that use it for either display or indexing tasks and how this would affect a dataset that had controlled vocabulary values before the type was changed (rare, but could happen. I'll just add a note in the testing section to see if that can be looked at and make sure things look fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at code outside the PR- there is an isAllowControlledVocabulary() method above that returns allowControlledVocabulary (so a duplicate now). Perhaps the xhtml or other places where we need this change could just switch to use that method.

FWIW: It does seem to be an odd case to have former CVVs - we're potentially relying on the metadatablock loader to keep old values/ people to keep the old values in the tsv when the field no longer uses them, etc. (I can hardly wait until they make the field CVV again and want new values and not have the old ones show up as choices ...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree it's an edge case. It's possible any time we would switch from cvv to not, we'd write a db script to convert the values anyway?

@scolapasta scolapasta assigned scolapasta and qqmyers and unassigned qqmyers and scolapasta Feb 5, 2024
@@ -2,13 +2,13 @@
astrophysics Astronomy and Astrophysics Metadata
#datasetField name title description watermark fieldType displayOrder displayFormat advancedSearchField allowControlledVocabulary allowmultiples facetable displayoncreate required parent metadatablock_id
astroType Type The nature or genre of the content of the files in the dataset. text 0 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics
astroFacility Facility The observatory or facility where the data was obtained. text 1 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense - I just wonder how why they were ever true in the first place?

@@ -284,7 +284,7 @@ public void setDisplayOnCreate(boolean displayOnCreate) {
}

public boolean isControlledVocabulary() {
return controlledVocabularyValues != null && !controlledVocabularyValues.isEmpty();
return allowControlledVocabulary;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qqmyers that makes sense. My concern would be if there are other places that use it for either display or indexing tasks and how this would affect a dataset that had controlled vocabulary values before the type was changed (rare, but could happen. I'll just add a note in the testing section to see if that can be looked at and make sure things look fine.

@jp-tosca jp-tosca self-assigned this Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GDCC: DANS related to GDCC work for DANS Size: 10 A percentage of a sprint. 7 hours.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing constraints allow duplicate values for external CVoc and oaiset
8 participants