-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
External Controlled Vocab: add missing db constraints, add cvoc transaction and try/catch #9984
External Controlled Vocab: add missing db constraints, add cvoc transaction and try/catch #9984
Conversation
…rolledVocabulary instead of looking up complete list of terms to see if it is empty
…straints Amendment
* Applied IQSS#9984 * Merge IQSS#9955 --------- Co-authored-by: Jan van Mansum <[email protected]> Co-authored-by: Jan van Mansum <[email protected]>
Amend metadata block TSVs
….com/GlobalDataverseCommunityConsortium/dataverse.git into IQSS/9983-missing_unique_constraints
@qqmyers Can this be merged into the next release or is there something blocking it, besides the review? |
2024/01/30
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good, had a few basic questions.
|
||
|
||
|
||
TODO: Add note about reloading metadata blocks after upgrade. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a todo that we want before we qa / merge this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is just a note that the main release notes should include the reload block instructions for these blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you guys think we should clarify that whoever makes the release notes has to do this? I am not sure it is clear enough with the TODO: probably we could just mention it on Slack but we run the risk of someone missing the message 🤷🏼 @scolapasta @qqmyers
@@ -2,13 +2,13 @@ | |||
astrophysics Astronomy and Astrophysics Metadata | |||
#datasetField name title description watermark fieldType displayOrder displayFormat advancedSearchField allowControlledVocabulary allowmultiples facetable displayoncreate required parent metadatablock_id | |||
astroType Type The nature or genre of the content of the files in the dataset. text 0 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics | |||
astroFacility Facility The observatory or facility where the data was obtained. text 1 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume these relate to the todo? What is the reason for changing these values? (I probably just missed it, now I'm just curious)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the reason the blocks would need to be reloaded. I think these are inconsistencies - the block says they are controlled vocab fields but there are no vocab values for them in the block. (Which is noticeable once you start looking at the definition rather than for the existence of controlled vocab values to decide what input type to display.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense - I just wonder how why they were ever true in the first place?
@@ -284,7 +284,7 @@ public void setDisplayOnCreate(boolean displayOnCreate) { | |||
} | |||
|
|||
public boolean isControlledVocabulary() { | |||
return controlledVocabularyValues != null && !controlledVocabularyValues.isEmpty(); | |||
return allowControlledVocabulary; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this change because we think the definition is enough (and more important than actual values). Not sure, but could we ever get a case where a field changes FROM allow controlled vocab to disallowing, but we still have old datasets with CVV?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janvanmansum may know more here but my understanding is that this is primarily being used to decide if you display the controlled vocab list for metadata input or not. So with the change if you set the field to not be cvv, it will now show as a text input, whereas before, if you ever had cvv values in the db, the field shows as cvv going forward. I don't think it affects things if there are specific field values in datasets that have to display, its just about the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qqmyers that makes sense. My concern would be if there are other places that use it for either display or indexing tasks and how this would affect a dataset that had controlled vocabulary values before the type was changed (rare, but could happen. I'll just add a note in the testing section to see if that can be looked at and make sure things look fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at code outside the PR- there is an isAllowControlledVocabulary() method above that returns allowControlledVocabulary (so a duplicate now). Perhaps the xhtml or other places where we need this change could just switch to use that method.
FWIW: It does seem to be an odd case to have former CVVs - we're potentially relying on the metadatablock loader to keep old values/ people to keep the old values in the tsv when the field no longer uses them, etc. (I can hardly wait until they make the field CVV again and want new values and not have the old ones show up as choices ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree it's an edge case. It's possible any time we would switch from cvv to not, we'd write a db script to convert the values anyway?
@@ -2,13 +2,13 @@ | |||
astrophysics Astronomy and Astrophysics Metadata | |||
#datasetField name title description watermark fieldType displayOrder displayFormat advancedSearchField allowControlledVocabulary allowmultiples facetable displayoncreate required parent metadatablock_id | |||
astroType Type The nature or genre of the content of the files in the dataset. text 0 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics | |||
astroFacility Facility The observatory or facility where the data was obtained. text 1 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense - I just wonder how why they were ever true in the first place?
@@ -284,7 +284,7 @@ public void setDisplayOnCreate(boolean displayOnCreate) { | |||
} | |||
|
|||
public boolean isControlledVocabulary() { | |||
return controlledVocabularyValues != null && !controlledVocabularyValues.isEmpty(); | |||
return allowControlledVocabulary; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qqmyers that makes sense. My concern would be if there are other places that use it for either display or indexing tasks and how this would affect a dataset that had controlled vocabulary values before the type was changed (rare, but could happen. I'll just add a note in the testing section to see if that can be looked at and make sure things look fine.
What this PR does / why we need it: Per the issue, it adds missing db constraints and improves the code for registering new external vocab values by isolating the update in a small transaction and stopping any attempt to add a duplicate value from killing the overall dataset edit/other transaction being done.
Which issue(s) this PR closes:
Closes #9983
Special notes for your reviewer: As of now, we don't know why the constraints are not there. There is some dv-tech channel slack discussion about it but no conclusions.
If the db has duplicate values now, the current flyway script would cause install failure. However, it appears very unlikely that either table would have duplicates now. The code around both tables checks for existing values before saving a new one, so only race conditions can create duplicates. Problems with the external vocab table immediately start causing edit failures, so in addition to a race condition being unlikely (parallel changes and ext. vocab enabled), the problem would be immediately noticed. For the oaiset table, it seems even more unlikely that duplicates would exist as the code there also appears to check for an existing value with the same spec and there isn't any parallel case where one would expect multiple oaisets to be being created at once. It isn't clear to me that we can automate removing duplicates if they do exist, especially for oaiset where the descriptions and other fields may differ. My guess is that handling this with a release note instruction might be enough as problems should be extremely rare.
Suggestions on how to test this:
Regression: verify that OAI sets can be created/used and that external vocab values can be added to datasets and their metadata appears in the exported metadata. IN either case it appears to be hard to test the improvement from having the constraints as one would have to run multiple adds of an OAISet/multiple edits to datasets with the same ext. vocab value terms to be able to trigger the problem. (Prior to the PR, parallel changes would add duplicates to the database which you could see there and, at least for ext. vocab, you'd see failures related to multi-value exceptions when trying to retrieve ext. vocab values. After the PR, you shouldn't ever have duplicate values. For OAISets you might get errors when trying to run parallel ops. For the external vocab case, you'd see a fine message in the log about an attempt to write a duplicate value but the overall edit/metadata change would succeed.)
One other thing to test - have a dataset that has uses a type with controlled vocabulary values and then change the definition of the type to not use CVV. Make sure that this does not adversely affect the functionality (display, indexing, etc) of the existing dataset. (adding this because if the change to line 287 of DatasetFieldType.java.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?: included
Additional documentation: