Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mongo DB Storage #89

Merged
merged 16 commits into from
Jul 22, 2024
Merged

Mongo DB Storage #89

merged 16 commits into from
Jul 22, 2024

Conversation

southeo
Copy link
Collaborator

@southeo southeo commented Jul 15, 2024

Data Storage

We've changed from a postgres rdb to a mongo doc store! 🎉

Each pid record is a single document, identified by its handle. Each field in the pid record needs some metadata, which is now represented in a json:

  • index
  • type
  • data
    • for regular fields: format (always "string"), and value
    • for HS_ADMIN: format="admin", index=200, value=0.NA/{prefix}, permissions = 011111110011
  • ttl
  • timestamp

An example document is found below:

{
  "_id": "20.5000.1025/ABC",
  "normalisedSpecimenObjectId": "06211:BIRD-COLLECTION-1:RMS-123",
  "values": [
    {
      "index": 1,
      "type": "digitalObjectType",
      "data": {
        "format": "string",
        "value": "https://hdl.handle.net/21.T11148/532ce6796e2828dd2be6"
      },
      "ttl": 86400,
      "timestamp": "2024-06-28 04:32:04Z"
    },
    {
      "index": 203,
      "type": "normalisedPrimarySpecimenObjectId",
      "data": {
        "format": "string",
        "value": "06211:BIRD-COLLECTION-1:RMS-123"
      },
      "ttl": 86400,
      "timestamp": "2024-06-28 04:32:04Z"
    }
  ]
}

Indexing Local Identifiers

3 kinds of objects have local identifiers we can use to check if a handle has already been created for that kind of objects:

  • Annotations: annotationHash
  • specimens: normalisedSpecimenId
  • Media: mediaUrl (called primaryMediaId in the handle API)

When we create a specimen, we first search on normalisedSpecimenId to see if a handle has already been created for that object. By putting normalisedSpecimenId on the top-level of the mongo document, we can index the field and quickly find if the handle exists.

We don't do this for annotations or media, but I've included the functionality, if we want to make the check.

Changes to Updates

We no longer support partial updates. Now, the request must contain all fields required to build a new

  • We keep a few fields from the previous version: pidIssueDate, fdoType, license, PID, PidStatus. These fields will have their original timestamp
  • We increment pidIssueNumber
  • We replace all other fields, even if they are unchanged, updating the timestamp

Changes to Tombstone

Tombstoned records are not truncated anymore. Now, they have all the fields they had before they were tombstoned, but with additional tombstone metadata.

JsonSchemaValidator

  • When we allowed partial updates, we had separate validation protocols for updates and creates because we didn't require the user to provide all the fields for an update. Now, all fields are required for both kinds of requests, so the validator got simplified
  • Pretty sure we can remove this class honestly. We can the validator hassle by just accepting a specific object instead of a raw json. trying to keep this PR within its original scope though

Other

  • Cleaned up testsUtils
  • Refactored some names

southeo added 8 commits July 3, 2024 02:05
…lets-friggin-mongooo

# Conflicts:
#	src/main/java/eu/dissco/core/handlemanager/domain/fdo/FdoType.java
#	src/main/java/eu/dissco/core/handlemanager/domain/validation/JsonSchemaValidator.java
#	src/main/java/eu/dissco/core/handlemanager/service/DoiService.java
#	src/main/java/eu/dissco/core/handlemanager/service/FdoRecordService.java
#	src/main/java/eu/dissco/core/handlemanager/service/HandleService.java
#	src/main/java/eu/dissco/core/handlemanager/service/PidService.java
#	src/test/java/eu/dissco/core/handlemanager/controller/PidControllerTest.java
#	src/test/java/eu/dissco/core/handlemanager/domain/JsonSchemaValidatorTest.java
#	src/test/java/eu/dissco/core/handlemanager/service/DoiServiceTest.java
#	src/test/java/eu/dissco/core/handlemanager/service/FdoRecordServiceTest.java
#	src/test/java/eu/dissco/core/handlemanager/service/HandleServiceTest.java
#	src/test/java/eu/dissco/core/handlemanager/testUtils/TestUtils.java
@southeo southeo changed the title Feature/lets friggin mongooo Mongo DB Storage Jul 17, 2024
@southeo southeo marked this pull request as ready for review July 17, 2024 12:42
@southeo southeo requested a review from samleeflang July 18, 2024 07:28
Copy link
Contributor

@samleeflang samleeflang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the performance, do you feel it is equal to the db?
Really like how easy the mongodb repo looks

@southeo
Copy link
Collaborator Author

southeo commented Jul 19, 2024

Haven't tested large scale, but small tests are performing well. While we can't use the psql batch insert function anymore, i don't think handle creation will be a bottleneck. We now only insert one document instead of up to 30 rows per specimen. Updates are a lot better too, since we had something like update (ATTRIBUTE, where HANDLE = xyz and INDEX = 111) for EACH field we wanted to update. Now we just replace the document on the _id. And adding the local ids to the top-level is really good.
🥳

@southeo southeo requested a review from samleeflang July 19, 2024 09:57
Copy link

Copy link
Contributor

@samleeflang samleeflang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect

@southeo southeo merged commit a8376f3 into main Jul 22, 2024
2 checks passed
@southeo southeo deleted the feature/lets-friggin-mongooo branch July 22, 2024 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants