Skip to content

Data Models

southeo edited this page Nov 20, 2024 · 9 revisions

Open Data Specimen (openDS)

Terms Page | GitHub | JSON Schemas

latest version: 0.4

DiSSCo provides all data according to the OpenDS specification. The Digital Specimen is the core data model defining the abstraction to encapsulate "everything about" a physical specimen. OpenDS provides concepts and "logic" to express a Digital Specimen, but also media, organization types etc.

The openDS specification relates to other important structures, standards and initiatives in the wider world, as well as to information science in different domains of scientific discourse. Positioning openDS in the landscape is one of the first and most important steps in development of the specification. Making sure that everyone likely to make use of the model agrees on this is essential to progress.

FAIR Digital Object Records (FDO Record)

GitHub | JSON Schemas

latest version: 1.0

DiSSCo uses Persistent Identifiers (PIDs) for its digital objects, including Digital Specimens. A PID consists of at least two pieces of information: the name (i.e. the identifier string) and the location of the object.

In addition to the location of the referenced object, FDO Records contain structured metadata that describes the attributes and characteristics of the resource associated with the PID. The FDO Record is similar to the Persistent Identifier (PID) Record idea proposed by RDA, but the term FDO Record is used to “highlight that there could be possible [implementations] of FDO without explicitly relying on the attributes stored in a PID record”.

Different FDO Types will have different FDO Records. The FDO Record data model describes what attributes are required each Type of FDO Record. Development of the FDO Record data model is ongoing.

Types of PIDs in DiSSCo

DiSSCo will use two kinds of PIDs: Handles and DOIs (Digital Object Identifiers). Both use the Handle System as a resolver, but DOIs adhere to an additional regulatory framework overseen by the DOI Foundation that ensures provenance. For simple objects, such as annotations, a Handle will be sufficient. For more complex objects, like media objects or Digital Extended Specimens,

Attributes

The data model is cumulative, i.e. all records share a basic "kernel" of atttributes; more complex Types will have additional, relevant attributes.

PID Kernel

All objects within DiSSCo will have these attributes. Objects with only these attributes are simple enough to only require Handles as PIDs (e.g. annotations).

  • Example attributes: issueDate, pidIssuer, digitalObjectType

DOI Model

This level includes all attributes in the PID Kernel, plus additional information that aligns with the DOI Foundation's data model.

  • Example attributes: referentDoiName, referent

Media Object

All attributes in the DOI model, plus additional attributes for photographs, videos, audio files, and other supplementary media.

  • Example attributes: mediaHash, mediaUrl

Digital Extended Specimen

All attributes in the DOI model, plus additional attributes for Digital Extended Specimens.

  • Example attributes: specimenHost, physicalIdentifier

Minimum Information about a Specimen (MIDS)

GitHub

MIDS is a standard currently being developed by TDWG. DiSSCo uses MIDS to assess "completeness" of a Digital Specimen.

From TDWG:

The term ‘digitization’ is understood diversely in the natural history collections community. It can mean, for example: creating database records (of various extents); making images of collections containers, specimens and/or their label(s); a level of data capture; and more recently, semantic enrichment of data, and notions of ‘born digital’/’digital by default’. From one digitization initiative to another, the outputs can vary widely because aims, practices and procedures vary across different collection types and institutions. Thus, when a curator, collections manager or scientist talks of something being digitized it is not apparent in an objective way what is meant. Nor is it apparent what ‘sufficient digitization’ means and when (if at all) digitization is complete.

A harmonizing framework captured as a TDWG standard can help clarify levels (depth) of digitization and the minimum information captured and published at each level. This would help to ensure that enough data are captured, curated and published against specific requirements so they are useful for the widest range of possible purposes; as well as making it easier to consistently measure the extent of digitization achieved over time and to set priorities for remaining work. Such a framework would also be beneficial for ‘born digital’ specimens where digital data is captured from the outset, beginning with the gathering event.

Clone this wiki locally