Skip to content

Harmonisation assumptions

Sam Leeflang edited this page Nov 20, 2024 · 3 revisions

A small overview of assumptions that are being made by the Translator.

  • Agents. Sees | and & as separators between agents and will create separate agent object for it. When ID and Name are provided (recordedBy and recordedByID) it is combined. When multiple agents are present it assumes order, the first name is coupled to the first ID. If ID and names are not equal in items, it will take the one with the most items and ignore the other field (will log a warning)

  • IDs metadata. See for the full list of how we map and add metadata for identifiers here: https://github.com/DiSSCo/dissco-core-translator/blob/main/src/main/java/eu/dissco/core/translator/terms/utils/IdentifierUtils.java

  • Meter fields in location. The meter fields are mapped to a double field but we ingest it as a String field. We will try to sanatize the String so we have a higher change of it fitting in the harmonised field. For this we use a regex: ((-\s?)?\d+([\\.,]\d+)?)\s*m\.?(eter)?(tr?.?)?(\sm.?)? Which has been checked against this list of values:

9 m
8 m.
7.23 m
7,23 m.
123 m m.
123 m m 
10m
10mm (will not match as this is milimeter)
-135 m
- 135 m
81 meter
25 mtr
81meter
25 mtr.
25 mt.

Clone this wiki locally