Skip to content

Commit

Permalink
T300 bulkrax8.0.0 (#555)
Browse files Browse the repository at this point in the history
* Upgrade from Bulkrax 2.3.0 to 8.0.0, no configuration just yet

* Fixes uploads-with-files issue by pointing to bulkrax branch

* Work in Progress - tasks to ingest ProQuest ETD zips

* WIP - next need to create CSV from array of metadata hashes

* WIP - fixed problem creating header row

* Fixed embargo logic; fixed CSV structure

* Eliminated folder names from metadata csv FileSet entries; copy files to bulkrax zip staging directory, but will need to segregate files from each ETD into separate directoroes

* Adds 'bulkrax_identifier' metadata; fixes imports of works w/files, using prerelease of next bulkrax release.

* implemented parent work/child FileSet bulkrax_identifier, repaired embargo attributes

* refactor file paths for extracted zip; parse creator/contributors

* Repair attachment filenames with spaces (or else bulkrax will); fix author parsing

* Add degree, advisors, committee members

* Add gw_affiliation, date_created

* Simplify embargo date; add rights statement; clean up

* Fix truncated file; clarify configs, set default rights

* Update bulkrax hash, now contains db migration fix

* Code cleanup for PR

* Add scholarspace-ingest directory and volume mapping

* Add mapping for scholarspace-ingest directory

* Add CI directive to create ingest folder

* Upgrade Bulkrax to 8.1.0

* Allow admin user to visit /importers and /exporters even when there isn't yet an admin set for admin to deposit to

* Add fixture zips for bulkrax rspec testing

* Add sidekiq inline testing setting

* Set testing queue for inline sidekiq

* Modify ingest_bulkrax_prep when in test mode

* Add bulkrax importer tests

* Simplify bulkrax tests

* Populates degree and resource_type. License is still WIP, pending input from ScholComm

* Added resource_type field

---------
  • Loading branch information
kerchner authored Sep 4, 2024
1 parent bb250ce commit 5dfd206
Show file tree
Hide file tree
Showing 21 changed files with 637 additions and 42 deletions.
1 change: 1 addition & 0 deletions .github/workflows/ci-cache.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ jobs:
mkdir /opt/scholarspace-minter
mkdir /opt/scholarspace/fedora-data
mkdir /opt/scholarspace/solr-data
mkdir /opt/scholarspace/scholarspace-ingest
cd /opt/scholarspace
# Checkout the repository code
- name: Check out repository code
Expand Down
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ RUN mkdir -p /opt/scholarspace/scholarspace-hyrax \
&& mkdir -p /opt/scholarspace/scholarspace-tmp \
&& mkdir -p /opt/scholarspace/scholarspace-minter \
&& mkdir -p /opt/scholarspace/scholarspace-derivatives \
&& mkdir -p /opt/scholarspace/scholarspace-ingest \
&& chmod 775 -R /opt/scholarspace/scholarspace-derivatives

WORKDIR /opt/scholarspace/scholarspace-hyrax
Expand Down
3 changes: 1 addition & 2 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,7 @@ gem 'riiif', '~> 2.0'

gem 'cookies_eu'

#gem 'bulkrax', git: 'https://github.com/samvera-labs/bulkrax.git'
gem 'bulkrax', '2.3.0'
gem 'bulkrax', '8.1.0'

gem 'willow_sword', github: 'notch8/willow_sword'

Expand Down
17 changes: 11 additions & 6 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ GEM
babel-transpiler (0.7.0)
babel-source (>= 4.0, < 6)
execjs (~> 2.0)
bagit (0.4.5)
bagit (0.4.6)
docopt (~> 0.5.0)
validatable (~> 1.6)
base64 (0.2.0)
Expand Down Expand Up @@ -162,18 +162,21 @@ GEM
signet (~> 0.8)
typhoeus
builder (3.2.4)
bulkrax (2.3.0)
bagit (~> 0.4)
bulkrax (8.1.0)
bagit (~> 0.4.6)
coderay
denormalize_fields
iso8601 (~> 0.9.0)
kaminari
language_list (~> 1.2, >= 1.2.1)
libxml-ruby (~> 3.1.0)
libxml-ruby (~> 3.2.4)
loofah (>= 2.2.3)
marcel
oai (>= 0.4, < 2.x)
rack (>= 2.0.6)
rails (>= 5.1.6)
rdf (>= 2.0.2, < 4.0)
rubyzip
simple_form
byebug (11.1.3)
cancancan (1.17.0)
Expand Down Expand Up @@ -221,6 +224,8 @@ GEM
declarative-builder (0.1.0)
declarative-option (< 0.2.0)
declarative-option (0.1.0)
denormalize_fields (1.3.0)
activerecord (>= 4.1.14, < 8.0.0)
deprecation (1.1.0)
activesupport
devise (4.9.2)
Expand Down Expand Up @@ -577,7 +582,7 @@ GEM
multi_json
libv8-node (16.19.0.1-x86_64-darwin)
libv8-node (16.19.0.1-x86_64-linux)
libxml-ruby (3.1.0)
libxml-ruby (3.2.4)
link_header (0.0.8)
linkeddata (3.1.6)
equivalent-xml (~> 0.6)
Expand Down Expand Up @@ -1062,7 +1067,7 @@ DEPENDENCIES
blacklight_range_limit
bootsnap (>= 1.1.0)
bootstrap-sass (~> 3.0)
bulkrax (= 2.3.0)
bulkrax (= 8.1.0)
byebug
capybara (>= 2.15)
chosen-rails
Expand Down
11 changes: 1 addition & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ a separate user for the app, but it is not necessary. That user will need to ow
/opt/scholarspace/certs
/opt/scholarspace/scholarspace-tmp
/opt/scholarspace/scholarspace-minter
/opt/scholarspace/scholarspace-ingest
```
6. In `/opt/scholarspace/scholarspace-hyrax` run `cp example.env .env` to create the local environment file.
7. Edit `.env` to add the following values:
Expand Down Expand Up @@ -174,16 +175,6 @@ echo $CR_PAT | docker login ghcr.io -u [USERNAME] --password-stdin
## Setting up a new production instance
### (Optional) Install etd-loader
* Install the **etd-loader** application in `/opt/etd-loader` as per instructions at https://github.com/gwu-libraries/etd-loader
* When configuring `config.py`, ensure that it contains the following values:
```
ingest_path = "/opt/scholarspace/scholarspace-hyrax"
ingest_command = "rake RAILS_ENV=production gwss:ingest_etd"
```
### Migrating Production Database
In the app-server container (i.e. through `docker exec -it scholarspace-hyrax_app-server_1 /bin/sh`, followed by `su scholarspace`), run:
Expand Down
11 changes: 11 additions & 0 deletions app/models/ability.rb
Original file line number Diff line number Diff line change
Expand Up @@ -42,4 +42,15 @@ def contentadmins_can_create_curation_concerns
can :index, Hydra::AccessControls::Embargo
can :index, Hydra::AccessControls::Lease
end

# Added for Bulkrax 5.0.0+
def can_import_works?
# can_create_any_work?
admin? or contentadmin_user?
end

def can_export_works?
# can_create_any_work?
admin? or contentadmin_user?
end
end
4 changes: 4 additions & 0 deletions app/models/collection.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,8 @@ class Collection < ActiveFedora::Base
# You can replace these metadata if they're not suitable
include Hyrax::BasicMetadata
self.indexer = Hyrax::CollectionWithBasicMetadataIndexer

property :bulkrax_identifier, predicate: ::RDF::URI("https://iro.bl.uk/resource#bulkraxIdentifier"), multiple: false do |index|
index.as :stored_searchable, :facetable
end
end
6 changes: 6 additions & 0 deletions app/models/file_set.rb
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
# Generated by hyrax:models:install
class FileSet < ActiveFedora::Base
# include ::Hyrax::FileSetBehavior

property :bulkrax_identifier, predicate: ::RDF::URI("https://iro.bl.uk/resource#bulkraxIdentifier"), multiple: false do |index|
index.as :stored_searchable, :facetable
end

include ::Hyrax::FileSetBehavior
end
4 changes: 4 additions & 0 deletions app/models/gw_etd.rb
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,9 @@ class GwEtd < ActiveFedora::Base
index.as :stored_searchable, :facetable
end

property :bulkrax_identifier, predicate: ::RDF::URI("https://iro.bl.uk/resource#bulkraxIdentifier"), multiple: false do |index|
index.as :stored_searchable, :facetable
end

include ::Hyrax::BasicMetadata
end
4 changes: 4 additions & 0 deletions app/models/gw_journal_issue.rb
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ class GwJournalIssue < ActiveFedora::Base
index.as :stored_searchable
end

property :bulkrax_identifier, predicate: ::RDF::URI("https://iro.bl.uk/resource#bulkraxIdentifier"), multiple: false do |index|
index.as :stored_searchable, :facetable
end

# This must be included at the end, because it finalizes the metadata
# schema (by adding accepts_nested_attributes)
include ::Hyrax::BasicMetadata
Expand Down
6 changes: 5 additions & 1 deletion app/models/gw_work.rb
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,9 @@ class GwWork < ActiveFedora::Base
index.as :stored_searchable
end

property :bulkrax_identifier, predicate: ::RDF::URI("https://iro.bl.uk/resource#bulkraxIdentifier"), multiple: false do |index|
index.as :stored_searchable, :facetable
end

include ::Hyrax::BasicMetadata
end
end
146 changes: 146 additions & 0 deletions bin/importer
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
#!/usr/bin/env ruby
# frozen_string_literal: true

require_relative '../config/environment'

require 'slop'

def main(opts = {})
check_required_params

update = opts[:importer_id].present?
port = opts[:port].presence
url = build_url(opts.delete(:importer_id), opts.delete(:url), port)

headers = { 'Content-Type' => 'application/json' }
headers['Authorization'] = "Token: #{opts.delete(:auth_token)}"
params = build_params(opts)

logger.info("POST to #{url} - PARAMS #{params}")

conn = Faraday.new(
url: url,
headers: headers
)

response = if update
conn.put do |request|
request.body = params.to_json
end
else
conn.post do |request|
request.body = params.to_json
end
end

puts "#{response.status} - #{response.body.truncate(200)}"
end

def check_required_params
if opts[:importer_id].blank? && invalid?(opts)
puts 'Missing required parameters'
help
end

if opts[:auth_token].blank? # rubocop:disable Style/GuardClause
puts 'Missing Authentication Token --auth_token'
exit
end
end

def invalid?(opts)
required_params.each do |p|
return true if opts[p.to_sym].blank?
end
return false
end

def required_params
Bulkrax.api_definition['bulkrax']['importer'].map { |key, value| key if value['required'] == true }.compact
end

def build_params(opts = {})
params = {}
params[:commit] = opts.delete(:commit)
parser_fields = {
metadata_file_name: opts.delete(:metadata_file_name),
metadata_format: opts.delete(:metadata_format),
rights_statement: opts.delete(:rights_statement),
override_rights_statement: opts.delete(:override_rights_statement),
import_file_path: opts.delete(:import_file_path),
metadata_prefix: opts.delete(:metadata_prefix),
set: opts.delete(:set),
collection_name: opts.delete(:collection_name)
}.compact
params[:importer] = opts.compact
params[:importer][:user_id] = opts.delete(:user_id)
params[:importer][:admin_set_id] = opts.delete(:admin_set_id)
params[:importer][:parser_fields] = parser_fields || {}
return params.compact
end

def build_url(importer_id, url, port = nil)
if url.nil?
protocol = Rails.application.config.force_ssl ? 'https://' : 'http://'
host = Rails.application.config.action_mailer.default_url_options[:host]
url = "#{protocol}#{host}"
url = "#{url}:#{port}" if port
end
path = Bulkrax::Engine.routes.url_helpers.polymorphic_path(Bulkrax::Importer)
url = File.join(url, path)
url = File.join(url, importer_id) if importer_id
return url
end

def logger
Rails.logger
end

def version
puts "Bulkrax #{Bulkrax::VERSION}"
puts "Slop #{Slop::VERSION}"
end

# Format the help for the CLI
def help
puts 'CREATE:'
puts ' bin/importer --name "My Import" --parser_klass Bulkrax::CsvParser --commit "Create and Import" --import_file_path /data/tmp/import.csv --auth_token 12345'
puts 'UPDATE:'
puts ' bin/importer --importer_id 1 --commit "Update and Re-Import (update metadata only)" --import_file_path /data/tmp/import.csv --auth_token 12345'
puts 'PARAMETERS:'
Bulkrax.api_definition['bulkrax']['importer'].each_pair do |key, value|
next if key == 'parser_fields'
puts " --#{key}"
value.each_pair do |k, v|
next if k == 'contained_in'
puts " #{k}: #{v}"
end
end
puts ' --url'
puts " Repository URL"
exit
end

# Setup the options
options = Slop.parse do |o|
o.on '--version', 'Print the version' do
version
exit
end

o.on '--help', 'Print help' do
help
exit
end

Bulkrax.api_definition['bulkrax']['importer'].each_pair do |key, value|
if value['required'].blank?
o.string "--#{key}", value['definition'], default: nil
else
o.string "--#{key}", value['definition']
end
end
o.string '--url', 'Repository URL'
end

main(options.to_hash)
2 changes: 1 addition & 1 deletion config/environments/test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -41,5 +41,5 @@
# config.action_view.raise_on_missing_translations = true
config.permanent_url_base = "https://scholarspace-etds.library.gwu.edu/"

config.active_job.queue_adapter = :test
config.active_job.queue_adapter = :sidekiq
end
Loading

0 comments on commit 5dfd206

Please sign in to comment.