TL;DR: To get a development environment to run, follow these steps:
- Clone the repo and start the containers:
git clone [email protected]:os2datascanner/os2datascanner.git cd os2datascanner docker-compose up -d
You can now reach the following services on their respective ports:
- Administration module: http://localhost:8020
- Web interface for message queues: http://localhost:8030
- Report module: http://localhost:8040
(see Services for further information)
- Create logins for the django modules
Logins for the django modules (Administration and Report) must be created when the development environment is first started (and any time the data volume has been wiped). Having started the environment as described above, simply run
docker-compose exec admin-application python manage.py createsuperuser
and
docker-compose exec report-application python manage.py createsuperuser
You can pass username and email as arguments to the command by adding
--username <your username>
and/or--email <your email>
at the end of the snippets above, otherwise you will be prompted for them along with a password.Credentials for the message queue web interface can be found in here:
dev-environment/rabbitmq.env
- Start a scan:
Log into the administration module with the newly created superuser at http://localhost:8020
Go to
Administration
and add anOrganization
.Return to the main page, go to
Regler
(Rules) and add one.Go to
Scannerjob
and add a webscan using the organization and rule just created for a website - e.g.https://www.magenta.dk
NB! Please note that OS2datascanner has been built to scan an organization's own data sources, and to do so as efficiently as possible. Thus, OS2datascanner does not check for or adhere to e.g.
robots.txt
files, and may as a consequence overload a system or trigger automated safety measures; always ensure that the site administrator is okay with scanning the site!Start the scan by clicking the play button and confirming your choice.
- Follow the engine activity in RabbitMQ (optional):
- Log into the web interface for RabbitMQ - using the credentials mentioned above - at http://localhost:8030
- Queue activity is available on the
Queues
tab.
- See the results:
- Log into the report module with the newly created superuser at http://localhost:8040
- Go to the django admin site at http://localhost:8040/admin
- Create a new
Remediator
pointing to the superuser just created. - Return to the main page and check the results - refresh page for updates.
The repository contains a Dockerfile
for each of the OS2datascanner
modules:
- Administration:
docker/admin/Dockerfile
- Engine:
docker/engine/Dockerfile
- Report:
docker/report/Dockerfile
Using these is the recommended way to install OS2datascanner as a developer.
To run OS2datascanner in Docker, you need a running Docker daemon. See the official Docker documentation <https://docs.docker.com/install/> for installation instructions.
The containers for the Admin and Report modules require a connection to a
postgres database server. It is configured with the DATABASE_*
settings.
The database server must have a user and a database object. It can be created
with the help of the scripts in the /docker/postgres-initdb.d/
folder:
docker/postgres-initdb.d/20-create-admin-db-and-user.sh
docker/postgres-initdb.d/40-create-report-db-and-user.sh
The folder can easily be mounted into /docker-entrypoint-initdb.d/
in
the official postgres docker image, and
further contains a script to ensure that all relevant environment variables
have been passed to the container:
docker/postgres-initdb.d/10-test-for-valid-env-variables.sh
To run a fully functional OS2datascanner system, you will need to start a number of services. The recommended way to set up an appropriate development environment is to use Docker-compose.
Each Dockerfile
creates a dedicated user, and any services started are run
as the user created by the related Dockerfile
. All files generated by such
a service will be owned by the respective user. For each user, the UID
and
GID
are identical:
- Administration: 73020
- Engine: 73030
- Report: 73040
If you want to use another UID/GID
, you can specify it as the
--user=uid:gid
overwrite flag. for the
docker run
command or
in docker-compose.
If you change the UID/GID
, the /log
and /static
volumes may not
have the right permissions. It is recommended to only use
bind if you overwrite the
user and set the same user as owner of the directory you bind.
If some process inside the container needs to write files to locations other
than /static
or /log
, you need to mount a volume with the right
permissions. An example is ./manage.py makemigrations
trying to write to
code/src/os2datascanner/projects/<module>/<module>app/migrations/
for the
admin
or report
module.
If you bind /code
to your host system, make sure that the user with relevant
UID have write permissions to the /migrations/
folder.
This can be done with chmod o+w migrations
on your host where you grant all
users permission to write.
As a result of the user permissions in place, the user for the Administration
module does not have the privilege to write a .secret
file if one does not
exist. Rather than giving the user elevated permissions in production, one
should generate such a file by running the proper command once as root, and
then change the owner of the generated file to match the user running the
administration module.
In this way, the decryption functionality remains in place, while we still keep the user privileges to a minimum.
You can use docker-compose
to start the OS2datascanner system and its runtime
dependencies (PostgreSQL and RabbitMQ).
A docker-compose.yml
for development is included in the repository. It
specifies the settings to start and connect all required services.
The main services for OS2datascanner are:
admin-frontend
:Only needed in development.
Watches the frontend files and provides support for rebuilding the frontend easily during the development process.
admin-application
:Reachable on: http://localhost:8020
Runs the django application that provides the administration interface for defining and managing organisations, rules, scans etc.
engine_explorer
:Runs the explorer stage of the engine.
engine_processor
:Runs the processor stage of the engine.
engine_matcher
:Runs the matcher stage of the engine.
engine_tagger
:Runs the tagger stage of the engine.
engine_exporter
:Runs the exporter stage of the engine.
report-frontend
:Only needed in development.
Watches the frontend files and provides support for rebuilding the frontend easily during the development process.
report-application
:Reachable on: http://localhost:8040
Runs the django application that provides the interface for accessing and handling reported matches.
report-collector
:Runs the collector service that saves match results to the database of the report module.
These depend on some auxillary services:
db
:Runs a postgres database server based on the official postgres docker image.
queue
:Runs a RabbitMQ message queue server based on the official RabbitMQ docker image, including a plugin providing a web interface for monitoring (and managing) queues and users.
The web interface can be reached on: http://localhost:8030
The postgres database is initialized using the scripts included in
docker/postgres-initdb.d/
folder, which checks that the configuration is
valid, and adds postgres users for the modules that need them.
They do not populate the database with users for the django modules or any
other data.
As mentioned above, the system is not initialised with any default users for the django applications. Instead, these will need to be created by running
docker-compose {exec|run} {admin|report}-application python manage.py createsuperuser [--username <your username>] [--email <your email>]
where exec
is used when the development environment is already running, and
run
when it is not.
If you find yourself having to wipe the database often, you may find it helpful to write a small script to aid with this, e.g.:
# Go to correct directory
cd <path to repository root>
# create admin user:
echo "Creating superuser for admin module..."
docker-compose <command> admin-application python manage.py createsuperuser --username <your username> --email <your email>
# create report user:
echo "Creating superuser for report module..."
docker-compose <command> report-application python manage.py createsuperuser --username <your username> --email <your email>
NB! Make sure your script is not added to the repo: add the file (or a
separate folder it lives in) to the global list for git to ignore (usually
~/.config/git/ignore
, of which you may have to create the git
folder
and the ignore
file yourself).
Each module has its own test-suite. These are run automatically as part of the CI pipeline, which also produces a code coverage report for each test-suite.
During development, the test can be run using the relevant Docker image for each module. As some of the tests are integration tests that require auxiliary services - such as access to a database and/or message queue - we recommend using the development docker-compose set-up to run the tests, as this takes care of the required settings and bindings.
To run the test-suites using docker-compose:
docker-compose run admin-application python -m django test os2datascanner.projects.admin.tests
docker-compose run engine_explorer python -m unittest discover -s /code/src/os2datascanner/engine2/tests
docker-compose run report-application python -m django test os2datascanner.projects.report.tests
Please note that the engine tests can be run using any of the five pipeline services as the basis, but a specific one is provided above for easy reference.
To access a shell on any container based on the OS2datascanner module images, run
docker-compose {exec|run} <container name> bash
The documentation can be found at the OS2datascanner pages on Read the Docs
The coding standards below should be followed by all new and edited code for the project. Linting checks are applied, but currently allowed to fail; introducing a hard requirement would mean having to fill the version control history with commits only related to style, which is considered undesirable.
The OS2datascanner was programmed by Magenta ApS (https://magenta.dk) for OS2 - Offentligt digitaliseringsfællesskab, https://os2.eu.
Copyright (c) 2014-2020, OS2 - Offentligt digitaliseringsfællesskab.
The OS2datascanner is free software; you may use, study, modify and distribute it under the terms of version 2.0 of the Mozilla Public License. See the LICENSE file for details. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
All source code in this and the underlying directories is subject to the terms of the Mozilla Public License, v. 2.0.