Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System tuning for Docker-based setup #2698

Closed
kocio-pl opened this issue Jul 23, 2017 · 11 comments · Fixed by #2721
Closed

System tuning for Docker-based setup #2698

kocio-pl opened this issue Jul 23, 2017 · 11 comments · Fixed by #2721

Comments

@kocio-pl
Copy link
Collaborator

We should allow easy system tuning for Docker-based setup, since big database extract could make tester's life harder. When I disabled PostgreSQL tuning on my regular system, import of low zoom Europe data took 125 min instead of 57 min, which is a substantial difference.

On https://switch2osm.org/loading-osm-data/ there are two sections to consider:

  • Getting ready to load
  • Tuning

First one is about overcommit settings (I have it enabled and haven't tested how disabling it affects the performance) and the second one is about PostgreSQL memory settings. According to that page setting maintenance_work_mem and work_mem are "probably enough on a development or testing machine".

docker-compose.yml file can have a block like this (using conservative settings for 2 GB of RAM as default):

  db:
    image: mdillon/postgis:9.6
    command: postgres -c work_mem=$PG_WORK_MEM -c maintenance_work_mem=$PG_MAINTENANCE_WORK_MEM
    environment:
      - PG_WORK_MEM=16MB
      - PG_MAINTENANCE_WORK_MEM=128MB

but it just doesn't work:

$ docker-compose up db
WARNING: The PG_WORK_MEM variable is not set. Defaulting to a blank string.
WARNING: The PG_MAINTENANCE_WORK_MEM variable is not set. Defaulting to a blank string.
Starting openstreetmapcarto_db_1
Attaching to openstreetmapcarto_db_1
db_1       | FATAL:  invalid value for parameter "work_mem": ""
openstreetmapcarto_db_1 exited with code 1

Even if it works, we would have another place to change settings instead of .env file, where we store osm2pgsql settings - and docker-compose.yml is watched by git. But we create .env when starting import container, which is executed after db container is running, so PostgreSQL would still fail.

What could we do with it?

@kocio-pl
Copy link
Collaborator Author

kocio-pl commented Jul 28, 2017

Default settings:

$ docker exec -it openstreetmapcarto_db_1 psql -U postgres
psql (9.6.3)
Type "help" for help.

postgres=# SHOW work_mem;            
 work_mem 
----------
 4MB
(1 row)

postgres=# SHOW maintenance_work_mem;
 maintenance_work_mem 
----------------------
 64MB
(1 row)

@littlebtc
Copy link
Contributor

littlebtc commented Jul 31, 2017

I am thinking about a good workaround and make a pull request.

Any workaround should be a little bit dirty due to docker/compose#1377.

@kocio-pl
Copy link
Collaborator Author

Maybe we should make simple postgis-based container which will ensure that .env file is created and safely pass the values to a Postgres?

@littlebtc
Copy link
Contributor

I can make the work_mem work with following docker-compose.yml configuration, adding them to .env, and works like a charm:

  db:
    image: mdillon/postgis:9.6
    command:
      - --work_mem=$PG_WORK_MEM
      - --maintenance_work_mem=$PG_MAINTENANCE_WORK_MEM

But It requires env to be present in .env file, aka it can not be done with environment config under docker-compose.yml.

IMHO, instead of creating .env automatically, we can make a .env.sample, suggesting user to copy it and edit into real .env to bypass docker/compose#1377. Another advantage is that the user can really change settings based on their working machine before running any scripts.

@kocio-pl
Copy link
Collaborator Author

It's not too stable - removing .env file will cause the container to fail probably. I also like to have as quick and automated start as possible. Editing variables should be possible, but not required.

@littlebtc
Copy link
Contributor

Then make a postgis-based container seems to be the best way to go.

@littlebtc
Copy link
Contributor

littlebtc commented Aug 1, 2017

I think the best way is to make a custom Dockerfile, running ALTER SYSTEM in/docker-entrypoint-initdb.d. Though doc says it will apply after a PostgreSQL restart, but the entrypoint script will do that for us once in the script.

And we can prompt user to change that by either using psql and ALTER SYSTEM commands or changing Dockerfile.

A working Dockerfile is like below:

FROM mdillon/postgis:9.6

RUN echo 'ALTER SYSTEM SET work_mem="16MB";' > /docker-entrypoint-initdb.d/carto-perf.sql
RUN echo 'ALTER SYSTEM SET maintenance_work_mem="128MB";' >> /docker-entrypoint-initdb.d/carto-perf.sql

@kocio-pl
Copy link
Collaborator Author

kocio-pl commented Aug 1, 2017

It should test for .env file and use these values instead if available. Probably it means docker-startup.sh should be used here and modified so that Postgres and osm2pgsql default variables are created.

@littlebtc
Copy link
Contributor

littlebtc commented Aug 1, 2017

How about making another docker-pg-startup.sh?

FROM mdillon/postgis:9.6

ADD scripts/docker-pg-startup.sh /docker-entrypoint-initdb.d/

Then in scripts/docker-pg-startup.sh, doing like this and honor env settings when running ALTER SYSTEM.

Drawback is that the environment variable will only be applied when the postgresql database is initiated after destroyed.

@littlebtc
Copy link
Contributor

littlebtc commented Aug 1, 2017 via email

@kocio-pl
Copy link
Collaborator Author

kocio-pl commented Aug 1, 2017

Docker environment should be user friendly, there should be no need to install or alter anything other than simple configuration file - and only if you need decent performance. Otherwise it should work out of the box.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants