Skip to content

felixfaassen/os2datascanner

 
 

Repository files navigation

Installation

TL;DR: To get a development environment to run, follow these steps:

  1. Clone the repo and start the containers:
    git clone [email protected]:os2datascanner/os2datascanner.git
    cd os2datascanner
    docker-compose up -d

    You can now reach the following services on their respective ports:

    (see Services for further information)

  2. Create logins for the django modules

    Logins for the django modules (Administration and Report) must be created when the development environment is first started (and any time the data volume has been wiped). Having started the environment as described above, simply run

    docker-compose exec admin-application python manage.py createsuperuser

    and

    docker-compose exec report-application python manage.py createsuperuser

    You can pass username and email as arguments to the command by adding --username <your username> and/or --email <your email> at the end of the snippets above, otherwise you will be prompted for them along with a password.

    Credentials for the message queue web interface can be found in here:

    • dev-environment/rabbitmq.env
  3. Start a scan:
    1. Log into the administration module with the newly created superuser at http://localhost:8020

    2. Go to Administration and add an Organization.

    3. Return to the main page, go to Regler (Rules) and add one.

    4. Go to Scannerjob and add a webscan using the organization and rule just created for a website - e.g. https://www.magenta.dk

      NB! Please note that OS2datascanner has been built to scan an organization's own data sources, and to do so as efficiently as possible. Thus, OS2datascanner does not check for or adhere to e.g. robots.txt files, and may as a consequence overload a system or trigger automated safety measures; always ensure that the site administrator is okay with scanning the site!

    5. Start the scan by clicking the play button and confirming your choice.

  4. Follow the engine activity in RabbitMQ (optional):
    1. Log into the web interface for RabbitMQ - using the credentials mentioned above - at http://localhost:8030
    2. Queue activity is available on the Queues tab.
  5. See the results:
    1. Log into the report module with the newly created superuser at http://localhost:8040
    2. Go to the django admin site at http://localhost:8040/admin
    3. Create a new Remediator pointing to the superuser just created.
    4. Return to the main page and check the results - refresh page for updates.

Docker

The repository contains a Dockerfile for each of the OS2datascanner modules:

  • Administration: docker/admin/Dockerfile
  • Engine: docker/engine/Dockerfile
  • Report: docker/report/Dockerfile

Using these is the recommended way to install OS2datascanner as a developer.

To run OS2datascanner in Docker, you need a running Docker daemon. See the official Docker documentation <https://docs.docker.com/install/> for installation instructions.

The containers for the Admin and Report modules require a connection to a postgres database server. It is configured with the DATABASE_* settings. The database server must have a user and a database object. It can be created with the help of the scripts in the /docker/postgres-initdb.d/ folder:

  • docker/postgres-initdb.d/20-create-admin-db-and-user.sh
  • docker/postgres-initdb.d/40-create-report-db-and-user.sh

The folder can easily be mounted into /docker-entrypoint-initdb.d/ in the official postgres docker image, and further contains a script to ensure that all relevant environment variables have been passed to the container:

  • docker/postgres-initdb.d/10-test-for-valid-env-variables.sh

To run a fully functional OS2datascanner system, you will need to start a number of services. The recommended way to set up an appropriate development environment is to use Docker-compose.

User permissions

Each Dockerfile creates a dedicated user, and any services started are run as the user created by the related Dockerfile. All files generated by such a service will be owned by the respective user. For each user, the UID and GID are identical:

  • Administration: 73020
  • Engine: 73030
  • Report: 73040

If you want to use another UID/GID, you can specify it as the --user=uid:gid overwrite flag. for the docker run command or in docker-compose. If you change the UID/GID, the /log and /static volumes may not have the right permissions. It is recommended to only use bind if you overwrite the user and set the same user as owner of the directory you bind.

If some process inside the container needs to write files to locations other than /static or /log, you need to mount a volume with the right permissions. An example is ./manage.py makemigrations trying to write to code/src/os2datascanner/projects/<module>/<module>app/migrations/ for the admin or report module. If you bind /code to your host system, make sure that the user with relevant UID have write permissions to the /migrations/ folder. This can be done with chmod o+w migrations on your host where you grant all users permission to write.

Administration module: .secret file

As a result of the user permissions in place, the user for the Administration module does not have the privilege to write a .secret file if one does not exist. Rather than giving the user elevated permissions in production, one should generate such a file by running the proper command once as root, and then change the owner of the generated file to match the user running the administration module.

In this way, the decryption functionality remains in place, while we still keep the user privileges to a minimum.

Docker-compose

You can use docker-compose to start the OS2datascanner system and its runtime dependencies (PostgreSQL and RabbitMQ).

A docker-compose.yml for development is included in the repository. It specifies the settings to start and connect all required services.

Services

The main services for OS2datascanner are:

  • admin-frontend:

    Only needed in development.

    Watches the frontend files and provides support for rebuilding the frontend easily during the development process.

  • admin-application:

    Reachable on: http://localhost:8020

    Runs the django application that provides the administration interface for defining and managing organisations, rules, scans etc.

  • engine_explorer:

    Runs the explorer stage of the engine.

  • engine_processor:

    Runs the processor stage of the engine.

  • engine_matcher:

    Runs the matcher stage of the engine.

  • engine_tagger:

    Runs the tagger stage of the engine.

  • engine_exporter:

    Runs the exporter stage of the engine.

  • report-frontend:

    Only needed in development.

    Watches the frontend files and provides support for rebuilding the frontend easily during the development process.

  • report-application:

    Reachable on: http://localhost:8040

    Runs the django application that provides the interface for accessing and handling reported matches.

  • report-collector:

    Runs the collector service that saves match results to the database of the report module.

These depend on some auxillary services:

Postgres initialisation

The postgres database is initialized using the scripts included in docker/postgres-initdb.d/ folder, which checks that the configuration is valid, and adds postgres users for the modules that need them. They do not populate the database with users for the django modules or any other data.

Django application users

As mentioned above, the system is not initialised with any default users for the django applications. Instead, these will need to be created by running

docker-compose {exec|run} {admin|report}-application python manage.py createsuperuser [--username <your username>] [--email <your email>]

where exec is used when the development environment is already running, and run when it is not.

If you find yourself having to wipe the database often, you may find it helpful to write a small script to aid with this, e.g.:

# Go to correct directory
cd <path to repository root>
# create admin user:
echo "Creating superuser for admin module..."
docker-compose <command> admin-application python manage.py createsuperuser --username <your username> --email <your email>
# create report user:
echo "Creating superuser for report module..."
docker-compose <command> report-application python manage.py createsuperuser --username <your username> --email <your email>

NB! Make sure your script is not added to the repo: add the file (or a separate folder it lives in) to the global list for git to ignore (usually ~/.config/git/ignore, of which you may have to create the git folder and the ignore file yourself).

Tests

Each module has its own test-suite. These are run automatically as part of the CI pipeline, which also produces a code coverage report for each test-suite.

During development, the test can be run using the relevant Docker image for each module. As some of the tests are integration tests that require auxiliary services - such as access to a database and/or message queue - we recommend using the development docker-compose set-up to run the tests, as this takes care of the required settings and bindings.

To run the test-suites using docker-compose:

docker-compose run admin-application python -m django test os2datascanner.projects.admin.tests
docker-compose run engine_explorer python -m unittest discover -s /code/src/os2datascanner/engine2/tests
docker-compose run report-application python -m django test os2datascanner.projects.report.tests

Please note that the engine tests can be run using any of the five pipeline services as the basis, but a specific one is provided above for easy reference.

Shell access

To access a shell on any container based on the OS2datascanner module images, run

docker-compose {exec|run} <container name> bash

Documentation

The documentation can be found at the OS2datascanner pages on Read the Docs

Code standards

The coding standards below should be followed by all new and edited code for the project. Linting checks are applied, but currently allowed to fail; introducing a hard requirement would mean having to fill the version control history with commits only related to style, which is considered undesirable.

Licensing

The OS2datascanner was programmed by Magenta ApS (https://magenta.dk) for OS2 - Offentligt digitaliseringsfællesskab, https://os2.eu.

Copyright (c) 2014-2020, OS2 - Offentligt digitaliseringsfællesskab.

The OS2datascanner is free software; you may use, study, modify and distribute it under the terms of version 2.0 of the Mozilla Public License. See the LICENSE file for details. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.

All source code in this and the underlying directories is subject to the terms of the Mozilla Public License, v. 2.0.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • HTML 41.4%
  • Python 38.7%
  • JavaScript 10.8%
  • SCSS 2.7%
  • Shell 2.7%
  • CSS 2.7%
  • Dockerfile 1.0%