Skip to content
geoffrey jost edited this page May 19, 2015 · 4 revisions

Inventory

< geoff.froh at densho.org > 2015-01-08

[T]he purpose of the Inventory is to provide the definitive, ground-truth registry for what Collection repos should exist (and which should not)...

The first part of the solution is the combination of partner repos and the "master" ddr repo that you've already deployed. This gives us a beachhead for establishing ground truth. Just as the models and controlled vocabs are now encapsulated in the ddr repo, we can place the Inventory or Collection registry in the partner repos. And, of course, a corresponding registry of partners in the overall ddr repo. This allows a future archivist to bootstrap up the complete picture of what should be found in the Repository.

While the Store documents perform useful functions (i.e., tracking the physical location of the annexes), they do not provide a registry of collections that should exist. There should be a single text document in each partner repo that contains a list of all of the collection repos that should exist. This ensures a simple single-point where collections are registered, and a clear, human-readable artifact that is not dependent on any specific technology (other than text!).

Distributed database consisting of ddr repo and Volume files contained within ddr-PARTNER repos.

The master inventory is created by parsing these files. In this way they are analogous to *.journal files used by ledger-cli.

Inventory tells

  • Information about the Repository as a whole
  • List of partners
  • List of collections
  • List of volumes in use For each volume,
  • type, location, list of partners, collections For each collection,
  • location of instances and their type

ddr repo

  • Present on every ddr volume.
  • Metadata about the big-R Repository (keyword, title, description, logo, etc).
  • model fields
  • controlled vocab (topics, etc).
  • pointers to ddr-PARTNER repos.

ddr-PARTNER repos

ddr-PARTNER/
├── organization.json
└── volumes
    ├── 297e76ea-bb22-40e9-9999-0c9be5332a39.json
    ├── 297e76ea-bb22-40e9-9999-0c9be5332a39.log
    ├── 408A51BE8A51B160.json
    └── 408A51BE8A51B160.log
  • Present on every ddr volume used by a partner.
  • Metadata about the partner (keyword, title, description, logo, etc).
  • Volumes in use by the partner.

Volume

  • *.json and *.log files within ddr-PARTNER repos.
  • Filename: {uuid}.json, {uuid}.log.

*.json file contains

  • device info (devicetype, filesystem, label, UUID, size, purchase/create date, etc.
  • list of partner's collections on the device.
  • Information about each collection clone/instance.

*.log file contains

  • modifications to the Volume (collection created, cloned, removed) # modified?
  • timestamp and user for each modification.

Note: A volume file may not list all the collections on the volume! If a volume contains collections from multiple partners, it will be necessary to read multiple volume files from different partners to see what is on a specific volume.

Note: I don't think there is a way to recreate the history of a particular instance of a collection on a particular volume. We'll just have to start each *.log file with a notice that collections were present on Volume X when the logfile was created.

Collection instance

Each collection record in a {volume}.json file contains

  • collection ID
  • git-annex UUID
  • level (meta, access, all, ???)
  • local annex keys, size
  • known annex keys, size

HDD volume {UUID}.json

    "repo": "ddr"
    "org": "testing",
    "type": "hdd",
    "fstype": "ext4",
    "uuid": "297e76ea-bb22-40e9-9999-0c9be5332a39",
    "size": "1072693248",
    "label": "ddrworkstation",
    "location": "Pasadena",
    "collections": [
       {
            "cid": "ddr-testing-141",
            "uuid": "dfb5f708-c901-11e4-b1e1-e3fff14a483d",
            "level": "all",
            "local annex keys": "12",
            "local annex size": "128 MB",
            "known annex keys": "12",
            "known annex size": "128 MB",
        }
    ]
}```

### USB volume `{UUID}.json`
```{
    "repo": "ddr"
    "org": "testing",
    "type": "usb",
    "fstype": "ntfs",
    "uuid": "408A51BE8A51B160",
    "size": "500096991232",
    "label": "WD5000BMV-2",
    "location": "Pasadena",
    "purchase_date": "2013-03-01",
    "collections": [
       {
            "cid": "ddr-testing-228",
            "uuid": "e93ab2f4-7a4d-11e3-b2cf-37a8fc974942",
            "level": "meta",
            "local annex keys": "0",
            "local annex size": "0 MB",
            "known annex keys": "12",
            "known annex size": "128 MB",
        }
    ]
}```

### `SAMPLE-UUID.log`

2015-05-19T14:07-0800 gjost Created logfile. 2015-05-19T14:07-0800 gjost exists ddr-test-123 2015-05-19T14:07-0800 gjost exists ddr-test-246 2015-05-19T14:07-0800 gjost cloned ddr-test-247 2015-05-19T14:07-0800 gjost created ddr-test-248 2015-05-19T14:07-0800 gjost deleted ddr-test-248

Clone this wiki locally