Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External Controlled Vocab: add missing db constraints, add cvoc transaction and try/catch #9984

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions doc/release-notes/9983-unique-constraints.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
This release adds two missing database constraints that will assure that the externalvocabularyvalue table only has one entry for each uri and that the oaiset table only has one set for each spec. (In the very unlikely case that your existing database has duplicate entries now, install would fail. This can be checked by running

SELECT uri, count(*) FROM externalvocabularyvaluet group by uri;

and

SELECT spec, count(*) FROM oaiset group by spec;

and then removing any duplicate rows (where count>1).




TODO: Whoever puts the release notes together should make sure there is the standard note about reloading metadata blocks for the citation, astrophysics, and biomedical blocks (plus any others from other PRs) after upgrading.
6 changes: 3 additions & 3 deletions scripts/api/data/metadatablocks/astrophysics.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@
astrophysics Astronomy and Astrophysics Metadata
#datasetField name title description watermark fieldType displayOrder displayFormat advancedSearchField allowControlledVocabulary allowmultiples facetable displayoncreate required parent metadatablock_id
astroType Type The nature or genre of the content of the files in the dataset. text 0 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics
astroFacility Facility The observatory or facility where the data was obtained. text 1 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume these relate to the todo? What is the reason for changing these values? (I probably just missed it, now I'm just curious)

Copy link
Member Author

@qqmyers qqmyers Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the reason the blocks would need to be reloaded. I think these are inconsistencies - the block says they are controlled vocab fields but there are no vocab values for them in the block. (Which is noticeable once you start looking at the definition rather than for the existence of controlled vocab values to decide what input type to display.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense - I just wonder how why they were ever true in the first place?

astroInstrument Instrument The instrument used to collect the data. text 2 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics
astroFacility Facility The observatory or facility where the data was obtained. text 1 TRUE FALSE TRUE TRUE FALSE FALSE astrophysics
astroInstrument Instrument The instrument used to collect the data. text 2 TRUE FALSE TRUE TRUE FALSE FALSE astrophysics
astroObject Object Astronomical Objects represented in the data (Given as SIMBAD recognizable names preferred). text 3 TRUE FALSE TRUE TRUE FALSE FALSE astrophysics
resolution.Spatial Spatial Resolution The spatial (angular) resolution that is typical of the observations, in decimal degrees. text 4 TRUE FALSE FALSE TRUE FALSE FALSE astrophysics
resolution.Spectral Spectral Resolution The spectral resolution that is typical of the observations, given as the ratio \u03bb/\u0394\u03bb. text 5 TRUE FALSE FALSE TRUE FALSE FALSE astrophysics
resolution.Temporal Time Resolution The temporal resolution that is typical of the observations, given in seconds. text 6 FALSE FALSE FALSE FALSE FALSE FALSE astrophysics
coverage.Spectral.Bandpass Bandpass Conventional bandpass name text 7 TRUE TRUE TRUE TRUE FALSE FALSE astrophysics
coverage.Spectral.Bandpass Bandpass Conventional bandpass name text 7 TRUE FALSE TRUE TRUE FALSE FALSE astrophysics
coverage.Spectral.CentralWavelength Central Wavelength (m) The central wavelength of the spectral bandpass, in meters. Enter a floating-point number. float 8 TRUE FALSE TRUE TRUE FALSE FALSE astrophysics
coverage.Spectral.Wavelength Wavelength Range The minimum and maximum wavelength of the spectral bandpass. Enter a floating-point number. none 9 FALSE FALSE TRUE FALSE FALSE FALSE astrophysics
coverage.Spectral.MinimumWavelength Minimum (m) The minimum wavelength of the spectral bandpass, in meters. Enter a floating-point number. float 10 TRUE FALSE FALSE TRUE FALSE FALSE coverage.Spectral.Wavelength astrophysics
Expand Down
2 changes: 1 addition & 1 deletion scripts/api/data/metadatablocks/biomedical.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
studyAssayOtherTechnologyType Other Technology Type If Other was selected in Technology Type, list any other technology types that were used in this Dataset. text 9 TRUE FALSE TRUE TRUE FALSE FALSE biomedical
studyAssayPlatform Technology Platform The manufacturer and name of the technology platform used in the assay (e.g. Bruker AVANCE). text 10 TRUE TRUE TRUE TRUE FALSE FALSE biomedical
studyAssayOtherPlatform Other Technology Platform If Other was selected in Technology Platform, list any other technology platforms that were used in this Dataset. text 11 TRUE FALSE TRUE TRUE FALSE FALSE biomedical
studyAssayCellType Cell Type The name of the cell line from which the source or sample derives. text 12 TRUE TRUE TRUE TRUE FALSE FALSE biomedical
studyAssayCellType Cell Type The name of the cell line from which the source or sample derives. text 12 TRUE FALSE TRUE TRUE FALSE FALSE biomedical
#controlledVocabulary DatasetField Value identifier displayOrder
studyDesignType Case Control EFO_0001427 0
studyDesignType Cross Sectional EFO_0001428 1
Expand Down
2 changes: 1 addition & 1 deletion scripts/api/data/metadatablocks/citation.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@
seriesName Name The name of the dataset series text 66 #VALUE TRUE FALSE FALSE TRUE FALSE FALSE series citation
seriesInformation Information Can include 1) a history of the series and 2) a summary of features that apply to the series textbox 67 #VALUE FALSE FALSE FALSE FALSE FALSE FALSE series citation
software Software Information about the software used to generate the Dataset none 68 , FALSE FALSE TRUE FALSE FALSE FALSE citation https://www.w3.org/TR/prov-o/#wasGeneratedBy
softwareName Name The name of software used to generate the Dataset text 69 #VALUE FALSE TRUE FALSE FALSE FALSE FALSE software citation
softwareName Name The name of software used to generate the Dataset text 69 #VALUE FALSE FALSE FALSE FALSE FALSE FALSE software citation
softwareVersion Version The version of the software used to generate the Dataset, e.g. 4.11 text 70 #NAME: #VALUE FALSE FALSE FALSE FALSE FALSE FALSE software citation
relatedMaterial Related Material Information, such as a persistent ID or citation, about the material related to the Dataset, such as appendices or sampling information available outside of the Dataset textbox 71 FALSE FALSE TRUE FALSE FALSE FALSE citation
relatedDatasets Related Dataset Information, such as a persistent ID or citation, about a related dataset, such as previous research on the Dataset's subject textbox 72 FALSE FALSE TRUE FALSE FALSE FALSE citation http://purl.org/dc/terms/relation
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@

import jakarta.ejb.EJB;
import jakarta.ejb.Stateless;
import jakarta.ejb.TransactionAttribute;
import jakarta.ejb.TransactionAttributeType;
import jakarta.inject.Named;
import jakarta.json.Json;
import jakarta.json.JsonArray;
Expand All @@ -34,6 +36,7 @@
import jakarta.persistence.NoResultException;
import jakarta.persistence.NonUniqueResultException;
import jakarta.persistence.PersistenceContext;
import jakarta.persistence.PersistenceException;
import jakarta.persistence.TypedQuery;

import org.apache.commons.codec.digest.DigestUtils;
Expand All @@ -46,7 +49,6 @@
import org.apache.http.impl.client.HttpClients;
import org.apache.http.protocol.HttpContext;
import org.apache.http.util.EntityUtils;

import edu.harvard.iq.dataverse.settings.SettingsServiceBean;

/**
Expand Down Expand Up @@ -448,6 +450,7 @@ public JsonObject getExternalVocabularyValue(String termUri) {
* @param cvocEntry - the configuration for the DatasetFieldType associated with this term
* @param term - the term uri as a string
*/
@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
public void registerExternalTerm(JsonObject cvocEntry, String term) {
String retrievalUri = cvocEntry.getString("retrieval-uri");
String prefix = cvocEntry.getString("prefix", null);
Expand Down Expand Up @@ -518,6 +521,8 @@ public void process(HttpResponse response, HttpContext context) throws HttpExcep
logger.fine("Wrote value for term: " + term);
} catch (JsonException je) {
logger.severe("Error retrieving: " + retrievalUri + " : " + je.getMessage());
} catch (PersistenceException e) {
logger.fine("Problem persisting: " + retrievalUri + " : " + e.getMessage());
}
} else {
logger.severe("Received response code : " + statusCode + " when retrieving " + retrievalUri
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ public void setDisplayOnCreate(boolean displayOnCreate) {
}

public boolean isControlledVocabulary() {
return controlledVocabularyValues != null && !controlledVocabularyValues.isEmpty();
return allowControlledVocabulary;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change because we think the definition is enough (and more important than actual values). Not sure, but could we ever get a case where a field changes FROM allow controlled vocab to disallowing, but we still have old datasets with CVV?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janvanmansum may know more here but my understanding is that this is primarily being used to decide if you display the controlled vocab list for metadata input or not. So with the change if you set the field to not be cvv, it will now show as a text input, whereas before, if you ever had cvv values in the db, the field shows as cvv going forward. I don't think it affects things if there are specific field values in datasets that have to display, its just about the type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qqmyers that makes sense. My concern would be if there are other places that use it for either display or indexing tasks and how this would affect a dataset that had controlled vocabulary values before the type was changed (rare, but could happen. I'll just add a note in the testing section to see if that can be looked at and make sure things look fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at code outside the PR- there is an isAllowControlledVocabulary() method above that returns allowControlledVocabulary (so a duplicate now). Perhaps the xhtml or other places where we need this change could just switch to use that method.

FWIW: It does seem to be an odd case to have former CVVs - we're potentially relying on the metadatablock loader to keep old values/ people to keep the old values in the tsv when the field no longer uses them, etc. (I can hardly wait until they make the field CVV again and want new values and not have the old ones show up as choices ...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree it's an edge case. It's possible any time we would switch from cvv to not, we'd write a db script to convert the values anyway?

}

/**
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
DO $$
BEGIN

BEGIN
ALTER TABLE externalvocabularyvalue ADD CONSTRAINT externalvocabularvalue_uri_key UNIQUE(uri);
EXCEPTION
WHEN duplicate_table THEN RAISE NOTICE 'Table unique constraint externalvocabularvalue_uri_key already exists';
END;

BEGIN
ALTER TABLE oaiset ADD CONSTRAINT oaiset_spec_key UNIQUE(spec);
EXCEPTION
WHEN duplicate_table THEN RAISE NOTICE 'Table unique constraint oaiset_spec_key already exists';
END;

END $$;
Loading