Skip to content

Commit

Permalink
feature: Import preservation_copy in mountpoint-aware format. (#870)
Browse files Browse the repository at this point in the history
This is a new feature and should increment the minor version number.
  • Loading branch information
sourcefilter authored Mar 17, 2021
1 parent 9fa1448 commit 98b571e
Show file tree
Hide file tree
Showing 6 changed files with 90 additions and 61 deletions.
2 changes: 1 addition & 1 deletion Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -937,4 +937,4 @@ RUBY VERSION
ruby 2.5.7p206

BUNDLED WITH
1.17.3
1.17.3
13 changes: 9 additions & 4 deletions app/assets/markdown/importer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,18 +87,23 @@

### File Name (required)

A _full file path_ to the file in the "Masters" NetApp volume. Currently this must be single-valued. If a `Work` has multiple files associated with it, then each file should be given its own line with the object type `Page` and a `Parent ARK` value that refers to the parent `Work`.
A _full file path_ to the file, beginning with the NetApp volume in the form `[volume].in.library.ucla.edu`. Currently this must be single-valued. If a `Work` has multiple files associated with it, then each file should be given its own line with the object type `Page` and a `Parent ARK` value that refers to the parent `Work`.

If the File Name starts with "Masters/", it will be used as is. Otherwise, it will be prepended with `Masters/dlmasters/`, in order to match the content of DLCS exports.
If the File Name does not start with `[volume].in.library.ucla.edu`, it is assumed to refer to `masters.in.library.ucla.edu`. If the first directory is explicitly `Masters/`, then it is interpreted as the volume root. Otherwise, it will be prepended with `masters.in.library.ucla.edu/dlmasters/`, in order to match the content of DLCS exports.

For all formats, any number of leading `/` characters will be ignored.

This field is a string. **This field is required**.

Examples:

- `masters.in.library.ucla.edu/dlmasters/postcards/masters/21198-zz00090nrm-1-master.tif`
- `Masters/dlmasters/postcards/masters/21198-zz00090ntn-1-master.tif`
<br> (Imported as `masters.in.library.ucla.edu/dlmasters/postcards/masters/21198-zz00090ntn-1-master.tif`)
- `postcards/masters/21198-zz00090nn2-1-master.tif`
<br> (Imported as `Masters/dlmasters/postcards/masters/21198-zz00090nn2-1-master.tif`)
- `Masters/DLTempSecure/ABC/xyz/file_123.tif`
<br> (Imported as `masters.in.library.ucla.edu/dlmasters/postcards/masters/21198-zz00090nn2-1-master.tif`)
- `//othermount.in.library.ucla.edu/ABC/xyz/file_123.tif`
<br> (Imported as `othermount.in.library.ucla.edu/ABC/xyz/file_123.tif`)

### Title (required)

Expand Down
16 changes: 11 additions & 5 deletions app/importers/californica_mapper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ def visibility_mapping
end

def access_copy
map_field(:access_copy).first || preservation_copy
map_field(:access_copy).to_a.first
end

def ark
Expand Down Expand Up @@ -294,12 +294,18 @@ def ladnn?
end

def preservation_copy
path = map_field(:preservation_copy).first.to_s.strip.sub(/^\//, '')
return nil if path.empty?
if path.start_with?('Masters/')
path = map_field(:preservation_copy).first.to_s.strip.sub(/^\/+/, '')
if path.empty?
nil
elsif path.start_with?(/[^\/]+.in.library.ucla.edu\//)
# Standard format: must specify netapp volume
path
elsif path.start_with?('Masters/')
# Legacy standard format: everything starts with "Masters/"
'masters.in.library.ucla.edu/' + path.sub(/^\/?Masters\//, '')
else
'Masters/dlmasters/' + path
# paths coming from DLCSExport
'masters.in.library.ucla.edu/dlmasters/' + path
end
end

Expand Down
8 changes: 6 additions & 2 deletions app/uploaders/csv_manifest_validator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -187,8 +187,12 @@ def validate_records

# Row has a File Name that doesn't exist
if @mapper.preservation_copy
full_path = File.join(file_uri_base_path, @mapper.preservation_copy)
this_row_warnings << "Rows contain a File Name that does not exist. Incorrect values may be imported." unless File.exist?(full_path)
if @mapper.preservation_copy.start_with?('masters.in.library.ucla.edu/')
full_path = File.join(file_uri_base_path, 'Masters', @mapper.preservation_copy.sub(/^masters.in.library.ucla.edu\//, ''))
this_row_warnings << "Rows contain a File Name that does not exist. Incorrect values may be imported." unless File.exist?(full_path)
else
this_row_warnings << "Unable to check that file exists. Only files in masters.in.library.ucla.edu/ can be checked at this time." unless File.exist?(full_path)
end
end

# Row has improperly formatted date values
Expand Down
88 changes: 51 additions & 37 deletions spec/importers/californica_mapper_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -283,8 +283,8 @@
'access_copy' => '' }
end

it 'uses preservation_copy' do
expect(mapper.access_copy).to eq 'Masters/dlmasters/abc/xyz.tif'
it 'is nil' do
expect(mapper.access_copy).to be_nil
end
end
end
Expand All @@ -295,32 +295,6 @@
end
end

describe '#preservation_copy' do
context 'when the path starts with a \'/\'' do
let(:metadata) { { 'File Name' => '/Masters/dlmasters/abc/xyz.tif' } }

it 'gets removed' do
expect(mapper.preservation_copy).to eq 'Masters/dlmasters/abc/xyz.tif'
end
end

context 'when the path starts with \'Masters/\'' do
let(:metadata) { { 'File Name' => 'Masters/dlmasters/abc/xyz.tif' } }

it 'imports as is' do
expect(mapper.preservation_copy).to eq 'Masters/dlmasters/abc/xyz.tif'
end
end

context 'when the path doesn\'t start with \'Masters\'' do
let(:metadata) { { 'File Name' => 'abc/xyz.tif' } }

it 'prepends \'Masters/dlmasters\'' do
expect(mapper.preservation_copy).to eq 'Masters/dlmasters/abc/xyz.tif'
end
end
end

describe '#ark' do
it "maps the required ark field" do
expect(mapper.ark).to eq('ark:/21198/zz0002nq4w')
Expand Down Expand Up @@ -376,27 +350,67 @@
end

describe '#preservation_copy' do
context 'when the path starts with a \'/\'' do
let(:metadata) { { 'File Name' => '/Masters/dlmasters/abc/xyz.tif' } }
context 'when the path begins with [volume].in.library.ucla.edu/' do
let(:metadata) { { 'File Name' => 'masters.in.library.ucla.edu/dlmasters/abc/xyz.tif' } }

it 'gets removed' do
expect(mapper.preservation_copy).to eq 'Masters/dlmasters/abc/xyz.tif'
it 'imports it unchanged' do
expect(mapper.preservation_copy).to eq 'masters.in.library.ucla.edu/dlmasters/abc/xyz.tif'
end

context 'when the path starts with any number of initial `/` characters' do
let(:metadata) { { 'File Name' => '//masters.in.library.ucla.edu/dlmasters/abc/xyz.tif' } }

it 'ignores them' do
expect(mapper.preservation_copy).to eq 'masters.in.library.ucla.edu/dlmasters/abc/xyz.tif'
end
end
end

context 'when the path is not supplied' do
let(:metadata) { {} }

it 'returns nil' do
expect(mapper.preservation_copy).to be_nil
end
end

context 'when the path is blank' do
let(:metadata) { { 'File Name' => '' } }

it 'returns nil' do
expect(mapper.preservation_copy).to be_nil
end
end

context 'when the path starts with \'Masters/\'' do
let(:metadata) { { 'File Name' => 'Masters/dlmasters/abc/xyz.tif' } }

it 'imports as is' do
expect(mapper.preservation_copy).to eq 'Masters/dlmasters/abc/xyz.tif'
it 'is interpreted relative to masters.in.library.ucla.edu/' do
expect(mapper.preservation_copy).to eq 'masters.in.library.ucla.edu/dlmasters/abc/xyz.tif'
end

context 'when the path starts with any number of initial `/` characters' do
let(:metadata) { { 'File Name' => '/Masters/dlmasters/abc/xyz.tif' } }

it 'ignores them' do
expect(mapper.preservation_copy).to eq 'masters.in.library.ucla.edu/dlmasters/abc/xyz.tif'
end
end
end

context 'when the path doesn\'t start with \'Masters\'' do
context 'when the path doesn\'t start with \'[volume].in.library.ucla.edu/\' or \'Masters\'' do
let(:metadata) { { 'File Name' => 'abc/xyz.tif' } }

it 'prepends \'Masters/dlmasters\'' do
expect(mapper.preservation_copy).to eq 'Masters/dlmasters/abc/xyz.tif'
it 'is interpreted relative to masters.in.library.ucla.edu/dlmasters/ (DLCSExport CSVs)' do
expect(mapper.preservation_copy).to eq 'masters.in.library.ucla.edu/dlmasters/abc/xyz.tif'
end

context 'when the path starts with \'/\'' do
let(:metadata) { { 'File Name' => '/abc/xyz.tif' } }

it 'is interpreted the same way' do
expect(mapper.preservation_copy).to eq 'masters.in.library.ucla.edu/dlmasters/abc/xyz.tif'
end
end
end
end
Expand Down
24 changes: 12 additions & 12 deletions spec/spec_helper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -102,16 +102,16 @@
# as the one that triggered the failure.
Kernel.srand config.seed

# Silence normal program output (cf. https://stackoverflow.com/questions/15430551/suppress-console-output-during-rspec-tests#15432948)
original_stderr = $stderr
original_stdout = $stdout
config.before(:all) do
# Redirect stderr and stdout
$stderr = File.open(File::NULL, "w")
$stdout = File.open(File::NULL, "w")
end
config.after(:all) do
$stderr = original_stderr
$stdout = original_stdout
end
# # Silence normal program output (cf. https://stackoverflow.com/questions/15430551/suppress-console-output-during-rspec-tests#15432948)
# original_stderr = $stderr
# original_stdout = $stdout
# config.before(:all) do
# # Redirect stderr and stdout
# $stderr = File.open(File::NULL, "w")
# $stdout = File.open(File::NULL, "w")
# end
# config.after(:all) do
# $stderr = original_stderr
# $stdout = original_stdout
# end
end

0 comments on commit 98b571e

Please sign in to comment.