used for blackbox testing, data-ingestion procedures
Make sure that your email server is NOT running because some of the endpoints that are used are sending emails to the input email addresses. For example, when using the endpoint for creating new registration data, there exists automatic function that sends email, what we don't want because we use this endpoint for importing existing data.
-
Install CLARIN-DSpace7.*. (postgres, solr, dspace backend)
-
get sources
2.1. Clone python-api: https://github.com/ufal/dspace-python-api (branch
main
)2.2. Clone submodules:
git submodule update --init libs/dspace-rest-python/
-
Get database dump (old CLARIN-DSpace) and unzip it into
input/dump
directory indspace-python-api
project. -
Create CLARIN-DSpace5.* databases (dspace, utilities) from dump. Run
scripts/start.local.dspace.db.bat
or usescipts/init.dspacedb5.sh
directly with your database.
-
Go to the
dspace/bin
in dspace7 installation and run the commanddspace database migrate force
(force because of local types). NOTE:dspace database migrate force
creates default database data that may be not in database dump, so after migration, some tables may have more data than the database dump. Data from database dump that already exists in database is not migrated. -
Create an admin by running the command
dspace create-administrator
in thedspace/bin
- Create JSON files from the database tables.
NOTE: You must do it for both databases
clarin-dspace
andclarin-utilities
(JSON files are stored in thedata
folder)
- Go to
dspace-python-api
and run
pip install -r requirements.txt
(optional on ubuntu like systems) apt install libpq-dev
python tools/db_to_json.py --database=clarin-dspace [--port] [--host] [--user] [--password]
python tools/db_to_json.py --database=clarin-utilities [--port] [--host] [--user] [--password]
If you omit --user
the value from project_settings.py
will be used.
- Prepare
dspace-python-api
project for migration
- copy the files used during migration into
input/
directory:
> ls -R ./input
input:
data icon
input/data:
bitstream.json fileextension.json piwik_report.json
bitstreamformatregistry.json ...
input/dump:
clarin-dspace.sql clarin-utilities.sql
input/icon:
aca.png by.png gplv2.png mit.png ...
- update
project_settings.py
-
Make sure, your backend configuration (
dspace.cfg
) includes all handle prefixes from generated handle json in propertyhandle.additional.prefixes
, e.g.,handle.additional.prefixes = 11858, 11234, 11372, 11346, 20.500.12801, 20.500.12800
-
Copy
assetstore
from dspace5 to dspace7 (for bitstream import).assetstore
is in the folder where you have installed DSpacedspace/assetstore
.
- Import data from the json files (python-api/input/) into dspace database (CLARIN-DSpace7.)
- NOTE: database must be up to date (
dspace database migrate force
must be called in thedspace/bin
) - NOTE: dspace server must be running
- run command
cd ./src && python repo_import.py
- The values of table attributes that describe the last modification time of dspace object (for example attribute
last_modified
in tableItem
) have a value that represents the time when that object was migrated and not the value from migrated database dump. - If you don't have valid and complete data, not all data will be imported.
- check if license link contains XXX. This is of course unsuitable for production run!
Use tools/repo_diff
utility, see README.