Skip to content

Commit

Permalink
feat(harvester): Configure automatic CRON job for CKAN Harvester #25
Browse files Browse the repository at this point in the history
- Implement CRON job setup for automatic execution of the CKAN Harvester.

This update enables the CKAN Harvester to run automatically at scheduled intervals, ensuring continuous and efficient harvesting of FAIR datapoints without manual input. The documentation in gdi-userportal-docs has been enhanced to include  instructions and findings related to the CRON Harvester setup

Closes #25
  • Loading branch information
Hans-Chrstian committed Feb 13, 2024
1 parent f19b435 commit 41fc91e
Show file tree
Hide file tree
Showing 4 changed files with 49 additions and 2 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ CKAN and all the components are configured using environment variables that you
* Ensure you have enough computer resources, if you are using `colima`: `colima start --arch aarch64 --vm-type=vz --mount-type=virtiofs --vz-rosetta --cpu 4 --memory 10`
* Add `127.0.0.1 keycloak` to `/etc/hosts`.
* Copy `.env.example` to `.env`.
* Ensure git submodules are pulled correctly after clone CKAN-DOCKER `git submodule update --init`

## 3. Useful commands

Expand Down
9 changes: 9 additions & 0 deletions ckan/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# SPDX-FileCopyrightText: 2006-2023 Open Knowledge Foundation and contributors
# SPDX-FileContributor: PNED G.I.E.
# SPDX-FileContributor: Stichting Health-RI
#
# SPDX-License-Identifier: AGPL-3.0-only

Expand All @@ -25,9 +26,17 @@ RUN pip3 install -e git+https://github.com/DataShades/ckanext-oidc-pkce.git@v0.
RUN pip3 install -e git+https://github.com/GenomicDataInfrastructure/gdi-userportal-ckanext-fairdatapoint.git@user-portal#egg=ckanext-fairdatapoint && \
pip3 install -r ${APP_DIR}/src/ckanext-fairdatapoint/requirements.txt

# Install Supervisor and cron
RUN apk update \
&& apk add --no-cache supervisor cronie \
&& rm -rf /var/cache/apk/*

# Create log directories for CKAN harvester
RUN mkdir -p /var/log/ckan/std && chown -R ckan:ckan /var/log/ckan

# Copy custom initialization scripts
COPY docker-entrypoint.d/* /docker-entrypoint.d/
COPY config/ckan_harvesting.conf /etc/supervisord.d/ckan_harvesting.conf

# Apply any patches needed to CKAN core or any of the built extensions (not the
# runtime mounted ones)
Expand Down
11 changes: 9 additions & 2 deletions ckan/Dockerfile.dev
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,18 @@ RUN pip3 install -e git+https://github.com/DataShades/ckanext-oidc-pkce.git@v0.
RUN pip3 install -e git+https://github.com/GenomicDataInfrastructure/gdi-userportal-ckanext-fairdatapoint.git@user-portal#egg=ckanext-fairdatapoint && \
pip3 install -r ${APP_DIR}/src/ckanext-fairdatapoint/requirements.txt

# Clone the extension(s) your are writing for your own project in the `src` folder
# to get them mounted in this image at runtime

# Install Supervisor and cron
RUN apk update \
&& apk add --no-cache supervisor cronie \
&& rm -rf /var/cache/apk/*

# Create log directories for CKAN harvester
RUN mkdir -p /var/log/ckan/std && chown -R ckan:ckan /var/log/ckan

# Copy custom initialization scripts
COPY docker-entrypoint.d/* /docker-entrypoint.d/
COPY config/ckan_harvesting.conf /etc/supervisord.d/ckan_harvesting.conf

# Apply any patches needed to CKAN core or any of the built extensions (not the
# runtime mounted ones)
Expand Down
30 changes: 30 additions & 0 deletions ckan/config/ckan_harvesting.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# SPDX-FileCopyrightText: Stichting Health-RI
#
# SPDX-License-Identifier: AGPL-3.0-only

[program:ckan_gather_consumer]

; Full Path to executable, should be path to virtural environment,
; Full path to config file too.

command=/usr/bin/ckan --config=/srv/app/ckan.ini harvester gather-consumer
numprocs=1
stdout_logfile=/var/log/ckan/std/gather_consumer.log
stderr_logfile=/var/log/ckan/std/gather_consumer.log
autostart=true
autorestart=true
startsecs=10

[program:ckan_fetch_consumer]

# ; Full Path to executable, should be path to virtural environment,
# ; Full path to config file too.

command=/usr/bin/ckan --config=/srv/app/ckan.ini harvester fetch-consumer

numprocs=1
stdout_logfile=/var/log/ckan/std/fetch_consumer.log
stderr_logfile=/var/log/ckan/std/fetch_consumer.log
autostart=true
autorestart=true
startsecs=10

0 comments on commit 41fc91e

Please sign in to comment.