Skip to content

Commit

Permalink
Merge pull request IQSS#10142 from IQSS/9547-validate-geospatial-meta…
Browse files Browse the repository at this point in the history
…data

Add validation for entered coordinate values in Geospatial metadata
  • Loading branch information
stevenwinship authored Nov 30, 2023
2 parents 960751f + 2d6efc5 commit 5b3a408
Show file tree
Hide file tree
Showing 8 changed files with 173 additions and 55 deletions.
9 changes: 9 additions & 0 deletions doc/release-notes/9547-validation-for-geospatial-metadata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Validation has been added for the Geographic Bounding Box values in the Geospatial metadata block. This will prevent improperly defined bounding boxes from being created via the edit page or metadata imports. (issue 9547). This also fixes the issue where existing datasets with invalid geoboxes were quietly failing to get reindexed.

For the "upgrade" steps section:

Update Geospatial Metadata Block

- `wget https://github.com/IQSS/dataverse/releases/download/v6.1/geospatial.tsv`
- `curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file @geospatial.tsv`

8 changes: 4 additions & 4 deletions scripts/api/data/metadatablocks/geospatial.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@
otherGeographicCoverage Other Other information on the geographic coverage of the data. text 4 #VALUE, FALSE FALSE FALSE TRUE FALSE FALSE geographicCoverage geospatial
geographicUnit Geographic Unit Lowest level of geographic aggregation covered by the Dataset, e.g., village, county, region. text 5 TRUE FALSE TRUE TRUE FALSE FALSE geospatial
geographicBoundingBox Geographic Bounding Box The fundamental geometric description for any Dataset that models geography is the geographic bounding box. It describes the minimum box, defined by west and east longitudes and north and south latitudes, which includes the largest geographic extent of the Dataset's geographic coverage. This element is used in the first pass of a coordinate-based search. Inclusion of this element in the codebook is recommended, but is required if the bound polygon box is included. none 6 FALSE FALSE TRUE FALSE FALSE FALSE geospatial
westLongitude West Longitude Westernmost coordinate delimiting the geographic extent of the Dataset. A valid range of values, expressed in decimal degrees, is -180,0 <= West Bounding Longitude Value <= 180,0. text 7 FALSE FALSE FALSE FALSE FALSE FALSE geographicBoundingBox geospatial
eastLongitude East Longitude Easternmost coordinate delimiting the geographic extent of the Dataset. A valid range of values, expressed in decimal degrees, is -180,0 <= East Bounding Longitude Value <= 180,0. text 8 FALSE FALSE FALSE FALSE FALSE FALSE geographicBoundingBox geospatial
northLongitude North Latitude Northernmost coordinate delimiting the geographic extent of the Dataset. A valid range of values, expressed in decimal degrees, is -90,0 <= North Bounding Latitude Value <= 90,0. text 9 FALSE FALSE FALSE FALSE FALSE FALSE geographicBoundingBox geospatial
southLongitude South Latitude Southernmost coordinate delimiting the geographic extent of the Dataset. A valid range of values, expressed in decimal degrees, is -90,0 <= South Bounding Latitude Value <= 90,0. text 10 FALSE FALSE FALSE FALSE FALSE FALSE geographicBoundingBox geospatial
westLongitude Westernmost (Left) Longitude Westernmost coordinate delimiting the geographic extent of the Dataset. A valid range of values, expressed in decimal degrees, is -180,0 <= West Bounding Longitude Value <= 180,0. text 7 FALSE FALSE FALSE FALSE FALSE FALSE geographicBoundingBox geospatial
eastLongitude Easternmost (Right) Longitude Easternmost coordinate delimiting the geographic extent of the Dataset. A valid range of values, expressed in decimal degrees, is -180,0 <= East Bounding Longitude Value <= 180,0. text 8 FALSE FALSE FALSE FALSE FALSE FALSE geographicBoundingBox geospatial
northLongitude Northernmost (Top) Latitude Northernmost coordinate delimiting the geographic extent of the Dataset. A valid range of values, expressed in decimal degrees, is -90,0 <= North Bounding Latitude Value <= 90,0. text 9 FALSE FALSE FALSE FALSE FALSE FALSE geographicBoundingBox geospatial
southLongitude Southernmost (Bottom) Latitude Southernmost coordinate delimiting the geographic extent of the Dataset. A valid range of values, expressed in decimal degrees, is -90,0 <= South Bounding Latitude Value <= 90,0. text 10 FALSE FALSE FALSE FALSE FALSE FALSE geographicBoundingBox geospatial
#controlledVocabulary DatasetField Value identifier displayOrder
country Afghanistan 0
country Albania 1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,7 @@
import edu.harvard.iq.dataverse.DatasetFieldType.FieldType;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;
import java.util.GregorianCalendar;
import java.util.*;
import java.util.logging.Logger;
import java.util.regex.Pattern;
import jakarta.validation.ConstraintValidator;
Expand All @@ -34,7 +32,6 @@ public void initialize(ValidateDatasetFieldType constraintAnnotation) {
}

public boolean isValid(DatasetFieldValue value, ConstraintValidatorContext context) {

context.disableDefaultConstraintViolation(); // we do this so we can have different messages depending on the different issue

boolean lengthOnly = false;
Expand All @@ -55,6 +52,38 @@ public boolean isValid(DatasetFieldValue value, ConstraintValidatorContext conte
return true;
}

// verify no junk in individual fields and values are within range
if (dsfType.getName() != null && (dsfType.getName().equals(DatasetFieldConstant.northLatitude) || dsfType.getName().equals(DatasetFieldConstant.southLatitude) ||
dsfType.getName().equals(DatasetFieldConstant.westLongitude) || dsfType.getName().equals(DatasetFieldConstant.eastLongitude))) {
try {
verifyBoundingBoxCoordinatesWithinRange(dsfType.getName(), value.getValue());
} catch (IllegalArgumentException iae) {
try {
context.buildConstraintViolationWithTemplate(dsfType.getDisplayName() + " " + BundleUtil.getStringFromBundle("dataset.metadata.invalidEntry")).addConstraintViolation();
} catch (NullPointerException e) {
}
return false;
}
}

// validate fields that are siblings and depend on each others values
if (value.getDatasetField().getParentDatasetFieldCompoundValue() != null &&
value.getDatasetField().getParentDatasetFieldCompoundValue().getParentDatasetField().getValidationMessage() == null) {
Optional<String> failureMessage = validateChildConstraints(value.getDatasetField());
if (failureMessage.isPresent()) {
try {
context.buildConstraintViolationWithTemplate(dsfType.getParentDatasetFieldType().getDisplayName() + " " +
BundleUtil.getStringFromBundle(failureMessage.get()) ).addConstraintViolation();

// save the failure message in the parent so we don't keep validating the children
value.getDatasetField().getParentDatasetFieldCompoundValue().getParentDatasetField().setValidationMessage(failureMessage.get());

} catch (NullPointerException npe) {
}
return false;
}
}

if (fieldType.equals(FieldType.TEXT) && !lengthOnly && value.getDatasetField().getDatasetFieldType().getValidationFormat() != null) {
boolean valid = value.getValue().matches(value.getDatasetField().getDatasetFieldType().getValidationFormat());
if (!valid) {
Expand Down Expand Up @@ -216,4 +245,60 @@ public boolean isValidAuthorIdentifier(String userInput, Pattern pattern) {
return pattern.matcher(userInput).matches();
}

// Validate child fields against each other and return failure message or Optional.empty() if success
public Optional<String> validateChildConstraints(DatasetField dsf) {
final String fieldName = dsf.getDatasetFieldType().getName() != null ? dsf.getDatasetFieldType().getName() : "";
Optional<String> returnFailureMessage = Optional.empty();

// Validate Child Constraint for Geospatial Bounding Box
// validate the four points of the box to insure proper layout
if (fieldName.equals(DatasetFieldConstant.northLatitude) || fieldName.equals(DatasetFieldConstant.westLongitude)
|| fieldName.equals(DatasetFieldConstant.eastLongitude) || fieldName.equals(DatasetFieldConstant.southLatitude)) {
final String failureMessage = "dataset.metadata.invalidGeospatialCoordinates";

try {
final Map<String, String> coords = new HashMap<>();
dsf.getParentDatasetFieldCompoundValue().getChildDatasetFields().forEach(f -> {
coords.put(f.getDatasetFieldType().getName(), f.getValue());
});
if (!validateBoundingBox(coords.get(DatasetFieldConstant.westLongitude),
coords.get(DatasetFieldConstant.eastLongitude),
coords.get(DatasetFieldConstant.northLatitude),
coords.get(DatasetFieldConstant.southLatitude))) {
returnFailureMessage = Optional.of(failureMessage);
}
} catch (IllegalArgumentException e) { // IllegalArgumentException NumberFormatException
returnFailureMessage = Optional.of(failureMessage);
}
}

return returnFailureMessage;
}

public static boolean validateBoundingBox(final String westLon, final String eastLon, final String northLat, final String southLat) {
boolean returnVal = false;

try {
Float west = verifyBoundingBoxCoordinatesWithinRange(DatasetFieldConstant.westLongitude, westLon);
Float east = verifyBoundingBoxCoordinatesWithinRange(DatasetFieldConstant.eastLongitude, eastLon);
Float north = verifyBoundingBoxCoordinatesWithinRange(DatasetFieldConstant.northLatitude, northLat);
Float south = verifyBoundingBoxCoordinatesWithinRange(DatasetFieldConstant.southLatitude, southLat);
returnVal = west <= east && south <= north;
} catch (IllegalArgumentException e) {
returnVal = false;
}

return returnVal;
}

private static Float verifyBoundingBoxCoordinatesWithinRange(final String name, final String value) throws IllegalArgumentException {
int max = name.equals(DatasetFieldConstant.westLongitude) || name.equals(DatasetFieldConstant.eastLongitude) ? 180 : 90;
int min = max * -1;

final Float returnVal = value != null ? Float.parseFloat(value) : Float.NaN;
if (returnVal.isNaN() || returnVal < min || returnVal > max) {
throw new IllegalArgumentException(String.format("Value (%s) not in range (%s-%s)", returnVal.isNaN() ? "missing" : returnVal, min, max));
}
return returnVal;
}
}
Original file line number Diff line number Diff line change
@@ -1,27 +1,6 @@
package edu.harvard.iq.dataverse.search;

import edu.harvard.iq.dataverse.ControlledVocabularyValue;
import edu.harvard.iq.dataverse.DataFile;
import edu.harvard.iq.dataverse.DataFileServiceBean;
import edu.harvard.iq.dataverse.DataFileTag;
import edu.harvard.iq.dataverse.Dataset;
import edu.harvard.iq.dataverse.DatasetField;
import edu.harvard.iq.dataverse.DatasetFieldCompoundValue;
import edu.harvard.iq.dataverse.DatasetFieldConstant;
import edu.harvard.iq.dataverse.DatasetFieldServiceBean;
import edu.harvard.iq.dataverse.DatasetFieldType;
import edu.harvard.iq.dataverse.DatasetLinkingServiceBean;
import edu.harvard.iq.dataverse.DatasetServiceBean;
import edu.harvard.iq.dataverse.DatasetVersion;
import edu.harvard.iq.dataverse.Dataverse;
import edu.harvard.iq.dataverse.DataverseLinkingServiceBean;
import edu.harvard.iq.dataverse.DataverseServiceBean;
import edu.harvard.iq.dataverse.DvObject;
import edu.harvard.iq.dataverse.DvObjectServiceBean;
import edu.harvard.iq.dataverse.Embargo;
import edu.harvard.iq.dataverse.FileMetadata;
import edu.harvard.iq.dataverse.GlobalId;
import edu.harvard.iq.dataverse.PermissionServiceBean;
import edu.harvard.iq.dataverse.*;
import edu.harvard.iq.dataverse.authorization.AuthenticationServiceBean;
import edu.harvard.iq.dataverse.authorization.providers.builtin.BuiltinUserServiceBean;
import edu.harvard.iq.dataverse.batch.util.LoggingUtil;
Expand All @@ -35,6 +14,7 @@
import edu.harvard.iq.dataverse.harvest.client.HarvestingClient;
import edu.harvard.iq.dataverse.settings.JvmSettings;
import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
import edu.harvard.iq.dataverse.util.BundleUtil;
import edu.harvard.iq.dataverse.util.FileUtil;
import edu.harvard.iq.dataverse.util.StringUtil;
import edu.harvard.iq.dataverse.util.SystemConfig;
Expand Down Expand Up @@ -1073,13 +1053,17 @@ public SolrInputDocuments toSolrDocs(IndexableDataset indexableDataset, Set<Long
if(maxNorthLat==null || Float.parseFloat(maxNorthLat) < Float.parseFloat(northLat)) {
maxNorthLat=northLat;
}
//W, E, N, S
solrInputDocument.addField(SearchFields.GEOLOCATION, "ENVELOPE(" + westLon + "," + eastLon + "," + northLat + "," + southLat + ")");

if (DatasetFieldValueValidator.validateBoundingBox(westLon, eastLon, northLat, southLat)) {
//W, E, N, S
solrInputDocument.addField(SearchFields.GEOLOCATION, "ENVELOPE(" + westLon + "," + eastLon + "," + northLat + "," + southLat + ")");
}
}
}
//Only one bbox per dataset
//W, E, N, S
if ((minWestLon != null || maxEastLon != null) && (maxNorthLat != null || minSouthLat != null)) {
if (DatasetFieldValueValidator.validateBoundingBox(minWestLon, maxEastLon, maxNorthLat, minSouthLat) &&
(minWestLon != null || maxEastLon != null) && (maxNorthLat != null || minSouthLat != null)) {
solrInputDocument.addField(SearchFields.BOUNDING_BOX, "ENVELOPE(" + minWestLon + "," + maxEastLon + "," + maxNorthLat + "," + minSouthLat + ")");
}

Expand Down
1 change: 1 addition & 0 deletions src/main/java/propertyFiles/Bundle.properties
Original file line number Diff line number Diff line change
Expand Up @@ -1626,6 +1626,7 @@ dataset.metadata.alternativePersistentId.tip=A previously used persistent identi
dataset.metadata.invalidEntry=is not a valid entry.
dataset.metadata.invalidDate=is not a valid date. "yyyy" is a supported format.
dataset.metadata.invalidNumber=is not a valid number.
dataset.metadata.invalidGeospatialCoordinates=has invalid coordinates. East must be greater than West and North must be greater than South. Missing values are NOT allowed.
dataset.metadata.invalidInteger=is not a valid integer.
dataset.metadata.invalidURL=is not a valid URL.
dataset.metadata.invalidEmail=is not a valid email address.
Expand Down
18 changes: 9 additions & 9 deletions src/main/java/propertyFiles/geospatial.properties
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@ datasetfieldtype.city.title=City
datasetfieldtype.otherGeographicCoverage.title=Other
datasetfieldtype.geographicUnit.title=Geographic Unit
datasetfieldtype.geographicBoundingBox.title=Geographic Bounding Box
datasetfieldtype.westLongitude.title=West Longitude
datasetfieldtype.eastLongitude.title=East Longitude
datasetfieldtype.northLongitude.title=North Latitude
datasetfieldtype.southLongitude.title=South Latitude
datasetfieldtype.westLongitude.title=Westernmost (Left) Longitude
datasetfieldtype.eastLongitude.title=Easternmost (Right) Longitude
datasetfieldtype.northLongitude.title=Northernmost (Top) Latitude
datasetfieldtype.southLongitude.title=Southernmost (Bottom) Latitude
datasetfieldtype.geographicCoverage.description=Information on the geographic coverage of the data. Includes the total geographic scope of the data.
datasetfieldtype.country.description=The country or nation that the Dataset is about.
datasetfieldtype.state.description=The state or province that the Dataset is about. Use GeoNames for correct spelling and avoid abbreviations.
Expand Down Expand Up @@ -89,10 +89,10 @@ controlledvocabulary.country.cook_islands=Cook Islands
controlledvocabulary.country.costa_rica=Costa Rica
controlledvocabulary.country.croatia=Croatia
controlledvocabulary.country.cuba=Cuba
controlledvocabulary.country.curacao=Curaçao
controlledvocabulary.country.curacao=Cura\u00e7ao
controlledvocabulary.country.cyprus=Cyprus
controlledvocabulary.country.czech_republic=Czech Republic
controlledvocabulary.country.cote_d'ivoire=Côte d'Ivoire
controlledvocabulary.country.cote_d'ivoire=C\u00f4te d'Ivoire
controlledvocabulary.country.denmark=Denmark
controlledvocabulary.country.djibouti=Djibouti
controlledvocabulary.country.dominica=Dominica
Expand Down Expand Up @@ -216,8 +216,8 @@ controlledvocabulary.country.qatar=Qatar
controlledvocabulary.country.romania=Romania
controlledvocabulary.country.russian_federation=Russian Federation
controlledvocabulary.country.rwanda=Rwanda
controlledvocabulary.country.reunion=Réunion
controlledvocabulary.country.saint_barthelemy=Saint Barthélemy
controlledvocabulary.country.reunion=R\u00e9union
controlledvocabulary.country.saint_barthelemy=Saint Barth\u00e9lemy
controlledvocabulary.country.saint_helena,_ascension_and_tristan_da_cunha=Saint Helena, Ascension and Tristan da Cunha
controlledvocabulary.country.saint_kitts_and_nevis=Saint Kitts and Nevis
controlledvocabulary.country.saint_lucia=Saint Lucia
Expand Down Expand Up @@ -282,4 +282,4 @@ controlledvocabulary.country.western_sahara=Western Sahara
controlledvocabulary.country.yemen=Yemen
controlledvocabulary.country.zambia=Zambia
controlledvocabulary.country.zimbabwe=Zimbabwe
controlledvocabulary.country.aland_islands=Åland Islands
controlledvocabulary.country.aland_islands=\u00c5land Islands
Original file line number Diff line number Diff line change
Expand Up @@ -213,4 +213,19 @@ public void testInvalidEmail() {
assertTrue(c.getMessage().contains("email"));
});
}
@Test
public void testBoundingBoxValidity() {
// valid tests
assertTrue(DatasetFieldValueValidator.validateBoundingBox("-180", "180", "90", "-90"));
assertTrue(DatasetFieldValueValidator.validateBoundingBox("0", "0", "0", "0"));

// invalid tests
assertTrue(!DatasetFieldValueValidator.validateBoundingBox("-180", null, "90", null));
assertTrue(!DatasetFieldValueValidator.validateBoundingBox(null, "180", null, "90"));
assertTrue(!DatasetFieldValueValidator.validateBoundingBox("-180", "180", "90", "junk"));
assertTrue(!DatasetFieldValueValidator.validateBoundingBox("45", "40", "90", "0"));
assertTrue(!DatasetFieldValueValidator.validateBoundingBox("360", "0", "90", "-90"));
assertTrue(!DatasetFieldValueValidator.validateBoundingBox("", "", "", ""));
assertTrue(!DatasetFieldValueValidator.validateBoundingBox(null, null, null, null));
}
}
Loading

0 comments on commit 5b3a408

Please sign in to comment.