Skip to content

Commit

Permalink
Merge pull request samvera#339 from samvera/external_files_guide
Browse files Browse the repository at this point in the history
Guide for storing fedora binaries on file system
  • Loading branch information
afred authored Sep 18, 2018
2 parents 66edfbb + 55348c0 commit ec15230
Show file tree
Hide file tree
Showing 3 changed files with 48 additions and 0 deletions.
4 changes: 4 additions & 0 deletions _data/sidebars/home_sidebar.yml
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,10 @@ entries:
url: /troubleshooting_riiif.html
output: web

- title: External Binaries
url: /external_binaries.html
output: web

- title: How Do I Disable Hyrax 1 User Notifications?
url: /how-to-disable-notifications.html
output: web
Expand Down
2 changes: 2 additions & 0 deletions pages/a-z.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,8 @@ sidebar: home_sidebar
<br>
<a class='atoz_term' href='/lease-embargoes-2.1.html'>Embargoes for Managers (2.1)</a>
<br>
<a class='atoz_term' href='/external_binaries.html'>External Binaries</a>
<br>
</div>
</div>
<div class='atoz_section'>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
title: "External Binaries"
keywords: external
categories: External Binaries
permalink: external_binaries.html
folder: samvera/how-to/external_binaries/sufia7/external_binaries.md
sidebar: home_sidebar
tags: [development_resources]
a-z: ['External Binaries']
version:
label: 'Sufia 7 (May apply to Hyrax but YMMV)'
branch:
label: 'Sufia 7.4.1'
link: 'https://github.com/samvera/sufia/tree/v7.4.1'
---

# Storing binaries externally

Some large repositories have found it desirable to store binary content in a
filesystem instead of in Fedora 4. Reasons for this might include ease of migration,
preservation strategy, performance, and horizontal scalability. Unfortunately,
this is not an out-of-the-box feature, and will require some custom development.
This guide was written during the migration of [ScholarSphere](https://scholarsphere.psu.edu/), a Sufia 7 repository
at Penn State University Library. Code examples can be found at the [ScholarSphere github repository](https://github.com/psu-stewardship/scholarsphere).

**Goal:** In a Sufia 7 / Fedora 4 application, store binary content externally to Fedora. The application should continue to function as usual. This general pattern should be applicable to Hyrax applications as well, but has not been tested there.

## Summary of this solution:
1. Binary content is stored on a filesystem to which the self-deposit application can write. In this case, /opt/heracles/binaries. This location is set via an environment variable, REPOSITORY_FILESTORE
1. Files are stored in a pair tree directory structure. For example, a FileSet with id `ht722h861h` would be stored on the filesystem at `/opt/heracles/binaries/ht/72/2h/86/ht722h861h/`
1. Within the pairtree, files are stored in a `bagit` format with a sha256 checksum
1. Binary content in the expected directory is available via a web server. Our above example would be available at `https://dce-fedora.vmhost.psu.edu/binaries/ht/72/2h/86/ht722h861h/data/world.png`. The address and port of the webserver are set via an environment variable, `REPOSITORY_FILESTORE_HOST`
1. External filestore functionality is controlled by an environment variable, `REPOSITORY_EXTERNAL_FILES`, and only enabled if that variable is set to ‘true’.
1. Fedora objects use [Fedora’s External Content feature](https://wiki.duraspace.org/display/FEDORA45/External+Content). When the Fedora object is created, it stores a URL in the `mime-type` field. When the object is retrieved, it delivers a 307 redirect to the file’s URL.
1. A one-time data migration is required. In order to store all content (including previous versions) externally, leaving no binary content in Fedora, we loop through all objects, store the binary content of each file version in a local tempfile, delete each file version in fedora, and re-create each file version with external content.
1. SHA1 checksums (as calculated by Fedora) are recorded before migration, and after migration are compared to re-calculated checksums of the file as written to disk. Note that the fedora checksum service will not work against externally managed files, so once you've converted to external binary storage you need to have another way of tracking fixity.
1. We do not attempt to migrate objects that have already been migrated
1. We rescue any errors that happen in the migration process and add them to a log that can be re-processed


## How to do it
A writeup of this work is available at [https://docs.google.com/document/d/13RXoWPvBfsaKsI-miFXjcbduVQ-SbKANN0yHGBVatB8](https://docs.google.com/document/d/13RXoWPvBfsaKsI-miFXjcbduVQ-SbKANN0yHGBVatB8)

0 comments on commit ec15230

Please sign in to comment.