The dataverse-mapper API can be used to map any JSON metadata to JSON metadata formatted for Dataverse. The metadata file will be formatted according to what's expected by the Dataverse Native API. If you have XML metadata you can use the DANS transformer service to map it to JSON.
This project uses:
- Python 3.9
- FastAPI
- Poetry
The default port in the example .env is 8080, change it to fit your needs.
cp dot_env_example .env
- `make build
Returns the current version of the API
- metadata - EASY metadata example - The input metadata describing a dataset in JSON.
- template - EASY template example - A template with the value you expect to map from the input metadata.
- mapping - EASY mapping example - A dictionary with key value pairs. The key is the typeName of the field in the template. The value is the path to the value in the input metadata.
When successful, the API call will return a JSON body formatted for ingestion into Dataverse. The call will return an exception on a failed attempt further elaborating what went wrong.
The mapping file connects the data in the source metadata to the target template. It contains a dictionary with key/value pairs. Value of those key/value pairs is a list of paths. These paths are used to find specific data in the source metadata. The key represents the field in the target template to which we want to add this data retrieved from the source metadata:
{
"distributionDate": ["result.Dataontwerpversies.Versie.Dataontwerp.GeldigVanaf"],
"kindOfData": ["result.Dataontwerpversies.Versie.Dataontwerp.SoortData"],
"frequencyOfDataCollection": ["result.Dataontwerpversies.Versie.Dataontwerp.TypeVerslagperiode"],
"samplingProcedure": ["result.Dataontwerpversies.Versie.Dataontwerp.GebruikteMethodologie"],
"socialScienceNotesSubject": ["result.Dataontwerpversies.Versie.Dataontwerp.Procesverloop"]
}
Dataverse JSON contains fields with the typeClass compound. This means that the value of that field will contain a set of other fields. The mapper has multiple ways of handling these compounds. An example of a compound is the author field in the citation block:
{
"typeName": "author",
"multiple": true,
"typeClass": "compound",
"value": [
{
"authorName": {
"typeName": "authorName",
"multiple": false,
"typeClass": "primitive",
"value": "LastAuthor1, FirstAuthor1"
},
"authorAffiliation": {
"typeName": "authorAffiliation",
"multiple": false,
"typeClass": "primitive",
"value": "AuthorAffiliation1"
},
"authorIdentifierScheme": {
"typeName": "authorIdentifierScheme",
"multiple": false,
"typeClass": "controlledVocabulary",
"value": "ORCID"
},
"authorIdentifier": {
"typeName": "authorIdentifier",
"multiple": false,
"typeClass": "primitive",
"value": "AuthorIdentifier1"
}
}
]
}
The simple way the mapper maps a compound is to
grab all values for its children from the source metadata.
These are put into lists, and the mapper will then
spread these out over multiple objects in the value
of the compound field. If there are multiple different children for which it retrieved values, it will combine them
based on index.
As an example:
If we find 10 authorName and 5 authorIdentifier values in the source metadata
10 objects are created in the list at "value": []
. The names and authorID's
will be placed together based on list index so the final 5 object will not
include an authorIdentifier. This way of mapping was made to handle source
metadata where the values meant for the children of the compound are in
completely different parts of the metadata, without a solid way of knowing
what child value should be combined with other child values.
The way to add this mapping to the mapper dictionary is to add the typeNames of the children with their respective paths:
{
"authorAffiliation": [
"result.record.metadata.ddi:codeBook.ddi:stdyDscr.ddi:citation.ddi:rspStmt.ddi:AuthEnty.@affiliation"
],
"authorName": [
"result.record.metadata.ddi:codeBook.ddi:stdyDscr.ddi:citation.ddi:rspStmt.ddi:AuthEnty.#text"
]
}
A different mapping is used for the child fields in a compound that can be found in the same object in the source metadata. Here the mapping file first requires the compound typeName and the path to the object in the source metadata:
{
"variable": {
"mapping": "result.Dataontwerpversies.Versie.Dataontwerp.Contextvariabelen.Contextvariabele[*]"
}
}
This allows the mapper to retrieve the objects that can be mapped to the
compound in its entirety. The children fields and their mappings are put inside
this variable
objects in the children
key. These paths are then used to
retrieve the child values from the source object.
The complete package looks like this:
{
"variable": {
"mapping": "result.Dataontwerpversies.Versie.Dataontwerp.Contextvariabelen.Contextvariabele[*]",
"children": {
"variableName": ["VerkorteSchrijfwijzeNaamVariabele"],
"variableLabel": ["LabelVanDeVariabele"],
"conceptVariableDefinition": ["Variabele.Definitie"],
"conceptVariableObjecttype": ["VariabeleObjecttypenaam"],
"conceptVariableValidFrom": ["Variabele.GeldigVanaf"],
"conceptVariableName": ["Variabele.UniekeNaam"],
"conceptVariableID": ["Variabele.Id"],
"conceptVariableVersion": ["Variabele.Versie"],
"conceptVariableVersionResponsibility": ["Variabele.Eigenaar"],
"variableProcessingInstruction": ["ToelichtingBijHetGebruik"],
"variableDataType": ["Datatype"],
"variableDefinition": ["ToelichtingBijDeDefinitie"],
"conceptVariableGroeppad": ["Variabele.Variabelengroeppad"],
"conceptVariableWaardestelselnaam": ["Variabele.Waardestelselnaam"],
"conceptVariableThema": ["Variabele.Themas.Thema"],
"variableVolgnummer": ["Volgnummer"],
"variableTrefwoord": ["Trefwoorden.Trefwoord"]
}
}
}