From 48b1ab79631ea7d7992eeda7a13fda627b099202 Mon Sep 17 00:00:00 2001 From: Bess Sadler Date: Fri, 7 Sep 2018 11:12:29 -0400 Subject: [PATCH 1/2] Guide for storing fedora binaries on filesystem The long version of this guide is in a google doc. We do not have the bandwidth at this time to re-write the entire guide, but I am attempting here to provide enough guidance that an institutions wishing to undertake a similar project can understand the issues and general scope of the work, along with some examples of previous art. --- _data/sidebars/home_sidebar.yml | 4 ++ pages/a-z.md | 2 + .../sufia7/external_binaries.md | 39 +++++++++++++++++++ 3 files changed, 45 insertions(+) create mode 100644 pages/samvera/developer_resources/external_binaries/sufia7/external_binaries.md diff --git a/_data/sidebars/home_sidebar.yml b/_data/sidebars/home_sidebar.yml index bffed04f..ca43f887 100644 --- a/_data/sidebars/home_sidebar.yml +++ b/_data/sidebars/home_sidebar.yml @@ -276,6 +276,10 @@ entries: url: /troubleshooting_riiif.html output: web + - title: External Binaries + url: /external_binaries.html + output: web + - title: How Do I Disable Hyrax 1 User Notifications? url: /how-to-disable-notifications.html output: web diff --git a/pages/a-z.md b/pages/a-z.md index 096697f2..9450d706 100644 --- a/pages/a-z.md +++ b/pages/a-z.md @@ -209,6 +209,8 @@ sidebar: home_sidebar
Embargoes for Managers (2.1)
+ External Binaries +
diff --git a/pages/samvera/developer_resources/external_binaries/sufia7/external_binaries.md b/pages/samvera/developer_resources/external_binaries/sufia7/external_binaries.md new file mode 100644 index 00000000..8131fbf1 --- /dev/null +++ b/pages/samvera/developer_resources/external_binaries/sufia7/external_binaries.md @@ -0,0 +1,39 @@ +--- +title: "External Binaries" +keywords: external +categories: External Binaries +permalink: external_binaries.html +folder: samvera/how-to/external_binaries/sufia7/external_binaries.md +sidebar: home_sidebar +tags: [development_resources] +a-z: ['External Binaries'] +version: + label: 'Sufia 7 (May apply to Hyrax but YMMV)' +--- + +# Storing binaries externally + +Some large repositories have found it desirable to store binary content in a +filesystem instead of in Fedora 4. Reasons for this might include ease of migration, +preservation strategy, performance, and horizontal scalability. Unfortunately, +this is not an out-of-the-box feature, and will require some custom development. +This guide was written during the migration of [ScholarSphere](https://scholarsphere.psu.edu/), a Sufia 7 repository +at Penn State University Library. Code examples can be found at the [ScholarSphere github repository](https://github.com/psu-stewardship/scholarsphere). + +**Goal:** In a Sufia 7 / Fedora 4 application, store binary content externally to Fedora. The application should continue to function as usual. This general pattern should be applicable to Hyrax applications as well, but has not been tested there. + +## Summary of this solution: + 1. Binary content is stored on a filesystem to which the self-deposit application can write. In this case, /opt/heracles/binaries. This location is set via an environment variable, REPOSITORY_FILESTORE + 1. Files are stored in a pair tree directory structure. For example, a FileSet with id `ht722h861h` would be stored on the filesystem at `/opt/heracles/binaries/ht/72/2h/86/ht722h861h/` + 1. Within the pairtree, files are stored in a `bagit` format with a sha256 checksum + 1. Binary content in the expected directory is available via a web server. Our above example would be available at `https://dce-fedora.vmhost.psu.edu/binaries/ht/72/2h/86/ht722h861h/data/world.png`. The address and port of the webserver are set via an environment variable, `REPOSITORY_FILESTORE_HOST` + 1. External filestore functionality is controlled by an environment variable, `REPOSITORY_EXTERNAL_FILES`, and only enabled if that variable is set to ‘true’. + 1. Fedora objects use [Fedora’s External Content feature](https://wiki.duraspace.org/display/FEDORA45/External+Content). When the Fedora object is created, it stores a URL in the `mime-type` field. When the object is retrieved, it delivers a 307 redirect to the file’s URL. + 1. A one-time data migration is required. In order to store all content (including previous versions) externally, leaving no binary content in Fedora, we loop through all objects, store the binary content of each file version in a local tempfile, delete each file version in fedora, and re-create each file version with external content. + 1. SHA1 checksums (as calculated by Fedora) are recorded before migration, and after migration are compared to re-calculated checksums of the file as written to disk. Note that the fedora checksum service will not work against externally managed files, so once you've converted to external binary storage you need to have another way of tracking fixity. + 1. We do not attempt to migrate objects that have already been migrated + 1. We rescue any errors that happen in the migration process and add them to a log that can be re-processed + + +## How to do it +A writeup of this work is available at [https://docs.google.com/document/d/13RXoWPvBfsaKsI-miFXjcbduVQ-SbKANN0yHGBVatB8](https://docs.google.com/document/d/13RXoWPvBfsaKsI-miFXjcbduVQ-SbKANN0yHGBVatB8) From 55348c0e7f1e3f663ccf4bbaf0fa1ea4df748f59 Mon Sep 17 00:00:00 2001 From: Andrew Myers Date: Tue, 18 Sep 2018 11:19:45 -0400 Subject: [PATCH 2/2] Adds additional version metadata --- .../external_binaries/sufia7/external_binaries.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/pages/samvera/developer_resources/external_binaries/sufia7/external_binaries.md b/pages/samvera/developer_resources/external_binaries/sufia7/external_binaries.md index 8131fbf1..d7af4baf 100644 --- a/pages/samvera/developer_resources/external_binaries/sufia7/external_binaries.md +++ b/pages/samvera/developer_resources/external_binaries/sufia7/external_binaries.md @@ -9,6 +9,9 @@ tags: [development_resources] a-z: ['External Binaries'] version: label: 'Sufia 7 (May apply to Hyrax but YMMV)' + branch: + label: 'Sufia 7.4.1' + link: 'https://github.com/samvera/sufia/tree/v7.4.1' --- # Storing binaries externally