Skip to content

Digital Specimen PID LifeCycle

Sharif Islam edited this page Jul 30, 2024 · 5 revisions

DiSSCo Digital Specimen PID LifeCycle

In our deliverable D7.1, part of the BiCiKL project focusing on developing a pan-European PID system for Digital Specimens, we proposed a concept for managing the lifecycle of Persistent Identifiers (PIDs). This concept is in line with the best practices in PID management of scholarly communication and other research outputs.

PID lifecycles, similar to data life cycles, are important concepts to accommodate different objects and workflow processes. These life cycles will be recorded as pidStatus in the PID record.

Our current (as of July 2024) PID status implementation:

PID Status Desc Notes
Draft Draft status means the PID record is not published and findable in the Digital Specimen repository.
Active PIDs are registered and indexed in the Digital Specimen repository.
Failed Indicates that the PID creation process failed and the PID record was not published.
Tombstone Digital object has been tombstoned

Happy flow:

New Digital Specimen creation:

  1. Specimen record is ingested. The specimen processing service determines if the specimen is new to DiSSCo, or if it is an update.
  2. PID service is called, either to create a new PID or update the existing record.
  • If PID string does not exist, the string is created (this is part of the PID "minting" and "registration" process). At this time, a FDO Record is created based on the data provided to the PID service. This means that a FDO Record is created along with the PID string.
  • If the specimen already exists, the FDO record is updated according to the information provided to the PID service, as long as the PID status for the record to be updated is not TOMBSTONED. The issueNumber for the FDO record is incremented. Only some fields of the FDO record may be changed; some fields, such as pidRecordIssueDate are automatically generated by the handle API, so they remain the same across record changes.
  1. The status ACTIVE is assigned to all new PIDs. This status does not change on an update.

Rollback and failure

We record failed status but there's no PID record published with such status in order not to clutter the system.

The rollback flow for new specimens is as follows:

  1. A new specimen record is ingested. A PID is assigned to it.
  2. If ingestion fails downstream, the handle manager removes the PID record from the PID database.
  3. The ingestion service removes the specimen from ElasticSearch and the Specimen database, if records exist.

The handle API deletes the records it created before the records are published, so no failed PIDs are released at all.

For updated specimens, the rollback flow is as follows:

  1. The specimen processing service sends an update request to the handle API, which updates the FDO Record.
  2. If the update fails downstream, the processing service recreates the FDO record based on the original version of the specimen and sends a rollback request to the handle API.
  3. The handle API updates the record again based on the values sent by the processing service, and the version number is decremented to its original version

FDO Record

To work with the PID during any lifecycle, we create FDO Records.

FDO Records are metadata inserted into the PID system (Handle system in our case) and it represents a specific digital object using a particular FDO Profile. FDO Profiles can be applied to multiple FDOs, but an FDO Record provides specific key-value pairs of attributes that show the values associated with those attributes for a particular FDO.

We have different FDO Profile schemas available in the DiSSCo [Schema repository](https://schemas.dissco.tech/schemas/fdo-profile/.

The pidStatus element indicates the lifecycle status.

Minimum FDO Record

  • fdoProfile
  • digitalObjectType
  • digitalObjectName
  • pid
  • pidIssuer
  • pidIssuerName
  • issuedForAgent
  • issuedForAgentName
  • pidRecordIssueDate
  • pidRecordIssueNumber
  • structuralType
  • pidStatus

Active PIDs cannot be deleted. They will be tomstoned.

Tombstone

We created a separate digital object (schema)[https://schemas.dissco.tech/schemas/fdo-type/shared-model/latest/tombstone-metadata.json] for tombstoned object. The metadata here will contain information on why, when and by whom a digital object was tombstoned.

We also have a (schema)[https://schemas.dissco.tech/schemas/fdo-profile/tombstone/latest/tombstone-request.json] for requests to tombstone resources to DiSSCo PID API.

Other statuses

PID string can be generated ahead of time to reserve. But this has not been implemented yet.

Based on specific use cases we can also have: Retired, Obsolete, Merged, Split

Data Flow diagram:

---
title: DiSSCo Digital Specimen PID Lifecycle
---  
flowchart 
subgraph "Happy Flow" 
direction LR
style A fill:#f9f,stroke:#333,stroke-width:4px
style B fill:#0ff,stroke:#132,stroke-width:4px
    A[/Specimen Record/] -->|Data Ingestion| B(De-dup Check)
    B --> C{PID exists?}
    C --> |No| D
    C --> |Yes| I{Check request type, if UPSERT}
    I --> |Yes| H{Update PID record}
    I --> |No| K{Reject request}
    H --> |set status| J{Active/Draft}
    D{Create PID String, Assign FDO Profile}
    D --> E{Assign Status}
    E --> Reserved
    E --> F[Active]
    E --> Draft
    F --> G{Data Curation}
    G --> Tombstoned
end 
subgraph "Rollback/Failure" 
A1[/Specimen Record/] -->|Data Ingestion| B1(De-dup Check)
B1 --> |External Failures/Errors| C2{Check Error Status}
C2 --> D2{Delete PID String}
end
Loading
Clone this wiki locally