Skip to content

Latest commit

 

History

History
113 lines (75 loc) · 3.94 KB

disaster-recovery.rst

File metadata and controls

113 lines (75 loc) · 3.94 KB

Disaster Recovery

Examples

The following examples are taken from Ian Ward's talk at the CKANConf Ottawa 2015 as published on his blog. They serve illustrative purpose; real-life DR and migration is documented in the next section. Exporting data (backup), restoring to same (disaster recovery) or remote (migration) instance:

ckanapi dump groups | ssh otherbox ckanapi load groups -p 3
ckanapi dump organizations | ssh otherbox ckanapi load organizations -p 3
ckanapi dump datasets | ssh otherbox ckanapi load datasets -p 3

Or pulling data from remote into local CKAN:

ckanapi dump datasets -r http://sourceckan | ckanapi load datasets -p 3

Track metadata using git:

ckanapi dump datasets > datasets.jsonl
git diff datasets.jsonl --stat
datasets.jsonl | 52 ++++++++++++++++++++++++++++++++++++----------------
1 file changed, 36 insertions(+), 16 deletions(-)

Summaries:

head -5 datasets.jsonl | jq .title
jq 'select(.organization.name!="nrcan-rncan")' -c datasets.jsonl | wc -l

Distributed loading:

split -n l/3 datasets.jsonl part
ckanapi load datasets -r http://web1 -a ... < partaa &
ckanapi load datasets -r http://web2 -a ... < partab &
ckanapi load datasets -r http://web3 -a ... < partac &

Create a database snapshot from inside a datacats environment:

(ckan)ubuntu@ip:/var/projects/ckan/SOURCE$ datacats paster db dump ../snapshot-DATE.sql

Notes: * SOURCE is the datacats CKAN environment to be backed up * DATE is the current datetime * datacats paster db will run in the /ckan source directory, so the prefix ../ will create the file in the current directory

Backup & Restore

This section describes the Disaster Recovery procedure assuming that a simple db dump / clean / load doesn't work.

Users

Ckanapi does not export users with password hashes. We'll have to migrate existin users manually.:

# exporting users, data only (-a), no owner (-O) from a non-datacats CKAN into file (-f) users.sql
sudo su
sudo -u postgres pg_dump -h localhost -p 5432 -a -O -t user -f users.sql CKAN_DBNAME

# chown to datacats user and appropriate group
sudo chown ubuntu:www-data users.sql

# delete lines "default", "logged_in", "visitor" from users.sql

# print passwords in datacats env:
cat ~/.datacats/ENV_NAME/sites/primary/passwords.ini

(datacats venv)me@machine:/path/to/ENV_NAME⟫ datacats shell

#datacats shell
(ckan)shell@ENV_NAME:~$ psql -h db -d ckan -U postgres -f users.sql
# paste password for user "postgres"

(ckan)shell@ENV_NAME:~$ exit

Groups, orgs, datasets

Run after migrating users:

ckanapi dump groups --all -p 3 -r http://source-ckan.org > groups.jsonl
ckanapi dump organizations --all -p 3 -r http://source-ckan.org > organizations.jsonl
ckanapi dump datasets --all -p 6 -r http://source-ckan.org > datasets.jsonl

ckanapi load groups -p 3 -I groups.jsonl -r http://target-ckan.org -a API_KEY
ckanapi load organizations -p 3 -I organizations.jsonl -r http://target-ckan.org -a API_KEY
ckanapi load datasets -p 3 -I datasets.jsonl -r http://target-ckan.org -a API_KEY

Files

Run any time:

sudo rsync -Pavvr DATACATS_DATADIR/SOURCE/sites/primary/files/ DATACATS_DATADIR/TARGET/sites/primary/files/
sudo chown -R ubuntu:www-data DATACATS_DATADIR/TARGET/sites/primary/files/
  • The DATACATS_DATADIR/ defaults to ~/.datacats, but in our installation lives at /var/projects/datacats_data/
  • SOURCE and TARGET are two datacats CKAN environments
  • datacats paster runs inside the datacats environment with prefix /project/ for the current work dir
  • http://target-ckan.org has an admin API_KEY

Archive

To archive all data, create a scheduled job to dump the db as well as the users, groups, orgs, datasets and rsync the dumps as well as the files (resource storage) to a secure place.