Mintmaker: create a cronjob for generating the OSV database #5379

querti · 2025-01-24T11:26:51Z

The database will be in a persistent volume which will be mounted by pods running renovate. The cronjob downloads the upstream OSV database and generates custom RPM and docker DB files on each run. The job uses the same image as MintMaker controller, so no modifications to the release process are necessary.

The reason why the cronjob was duplicated as a job is that the schedule "every X hours" does not create a job upon cronjob creation. This can cause issues when bootstrapping, since no DB will exist for several hours. The job is implemented as a hook, and the resource will self-delete after its run.

The database will be in a persistent volume which will be mounted by pods running renovate. The cronjob downloads the upstream OSV database and generates custom RPM and docker DB files on each run. The job uses the same image as MintMaker controller, so no modifications to the release process are necessary. The reason why the cronjob was duplicated as a job is that the schedule "every X hours" does not create a job upon cronjob creation. This can cause issues when bootstrapping, since no DB will exist for several hours. The job is implemented as a hook, and the resource will self-delete after its run.

openshift-ci · 2025-01-24T11:26:55Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2025-01-24T11:26:57Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: querti

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~components/mintmaker/OWNERS~~ [querti]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

querti · 2025-01-24T11:28:46Z

WARNING: this cronjob will not work until the corresponding mintmaker change has been merged (konflux-ci/mintmaker#163) and a new image has been built and changed in this repo.

querti · 2025-01-24T12:15:40Z

Related PRs: konflux-ci/mintmaker#163 and redhat-exd-rebuilds/renovate#8

TODO of what else is needed to make this functionality work:

rename changes.csv to releases.csv once available in prod
Mount the persistent volume to the pipelineruns
set env variables to disable downloading osv-offline and change the osv-offline DB location in pipelineruns
Once renovate PR is merged, add the changes to a branch that's used in the renovate image
Decide how old should the advisories be that will be added to the database
Decide how often should the DB be regenerated
Set containerVulnerabilityAlerts in the default config to enable this functionality.

gbenhaim · 2025-01-26T19:00:02Z

components/mintmaker/base/osv-database-builder/create-database-pvc.yaml

@@ -0,0 +1,12 @@
+apiVersion: v1


The default storage class provides EBS volumes. The pvc is going to be bounded to a single node, it means that all the jobs that will use it would need to be scheduled on the same node, this can cause a bottleneck.

Interesting. Checking the storage classes, it seems all of them use EBS volumes. Is there a way to overcome this limitation?

not at this moment.

gbenhaim · 2025-01-26T19:02:21Z

what is going to prevent the from updating the files in the pvc while other pods are using it?

querti · 2025-01-27T08:28:44Z

what is going to prevent the from updating the files in the pvc while other pods are using it?

I haven't considered this possibility, but checking the docs of the package which will use these files (https://github.com/seald/nedb), it claims "A copy of the whole database is kept in memory". I assume it will load the file to memory and perform the queries on the in-memory version. So maybe this won't be much of an issue.

Regardless, is there a simple way to overcome this problem?

gbenhaim · 2025-01-27T09:10:00Z

I suggest to build a container image with the data base and then pull in the job that need it.
You can also upload the files as an OCI artifact and then pull them using tool like oras (https://oras.land/).
This approach will overcome the two issues I mentioned above.

gbenhaim · 2025-01-27T09:10:51Z

what is going to prevent the from updating the files in the pvc while other pods are using it?

I haven't considered this possibility, but checking the docs of the package which will use these files (https://github.com/seald/nedb), it claims "A copy of the whole database is kept in memory". I assume it will load the file to memory and perform the queries on the in-memory version. So maybe this won't be much of an issue.

Regardless, is there a simple way to overcome this problem?

what is the size of that db?

querti · 2025-01-27T13:10:36Z

what is going to prevent the from updating the files in the pvc while other pods are using it?

I haven't considered this possibility, but checking the docs of the package which will use these files (https://github.com/seald/nedb), it claims "A copy of the whole database is kept in memory". I assume it will load the file to memory and perform the queries on the in-memory version. So maybe this won't be much of an issue.
Regardless, is there a simple way to overcome this problem?

what is the size of that db?

We haven't decided how much historical data to include, but I assume up to ~200MB

querti · 2025-01-27T13:13:00Z

I suggest to build a container image with the data base and then pull in the job that need it. You can also upload the files as an OCI artifact and then pull them using tool like oras (https://oras.land/). This approach will overcome the two issues I mentioned above.

The issue is that we want to refresh the data every few hours. I'm not sure we'd want to rebuild our image periodically like that.

gbenhaim · 2025-01-27T15:26:58Z

I suggest to build a container image with the data base and then pull in the job that need it. You can also upload the files as an OCI artifact and then pull them using tool like oras (https://oras.land/). This approach will overcome the two issues I mentioned above.

The issue is that we want to refresh the data every few hours. I'm not sure we'd want to rebuild our image periodically like that.

why not? what is the issue with that?

querti · 2025-01-28T08:46:41Z

I suggest to build a container image with the data base and then pull in the job that need it. You can also upload the files as an OCI artifact and then pull them using tool like oras (https://oras.land/). This approach will overcome the two issues I mentioned above.

The issue is that we want to refresh the data every few hours. I'm not sure we'd want to rebuild our image periodically like that.

why not? what is the issue with that?

At the moment, we only rebuild the image when releasing a new version of Mintmaker. We would need to rethink our release process, so the simplest solution is preferable.
I'm not too familiar with the OCI artifacts approach, but it seems like we would need to download the data. We run a lot of pods in parallel, so downloading this data so many times may cause congestion or slow down pod execution. But it may be a viable solution if there is some sort of caching mechanism.

Would you be OK with us pursuing the current solution, and only start exploring different approaches if the overwriting of data becomes an observable issue?

nvtnlucie · 2025-01-28T12:24:13Z

components/mintmaker/base/osv-database-builder/create-database-generator-cronjob.yaml

+              command:
+                - /bin/sh
+                - -c
+                - /osv-generator -destination-dir /mnt/database -docker-filename docker.nedb -rpm-filename rpm.nedb -days 1


We need to check how many data will be generated with days=1 (which can be done after Feb 10 when released.csv is available), but I would consider to increase this to 3 possibly, so we can still fix vulnerabilities in case of an outage.

This is just a placeholder value, I expect that we will gather more than just one day's worth of data. But for testing purposes, changes.csv has a lot of entries even for 1 day, so it will generate quite sizable databases.

qixiang · 2025-02-05T16:32:42Z

/hold

Adding the do-not-merge label to prevent accidental merging, as this change is not ready for merging.

openshift-ci bot added the do-not-merge/work-in-progress label Jan 24, 2025

openshift-ci bot added the approved label Jan 24, 2025

querti mentioned this pull request Jan 24, 2025

Make renovate use downstream osv-offline redhat-exd-rebuilds/renovate#8

Open

6 tasks

querti mentioned this pull request Jan 24, 2025

Introduce a script to generate a database usable by osv-offline konflux-ci/mintmaker#163

Open

querti marked this pull request as ready for review January 24, 2025 12:18

openshift-ci bot removed the do-not-merge/work-in-progress label Jan 24, 2025

openshift-ci bot requested review from qixiang and scoheb January 24, 2025 12:18

gbenhaim reviewed Jan 26, 2025

View reviewed changes

nvtnlucie reviewed Jan 28, 2025

View reviewed changes

openshift-ci bot added the do-not-merge/hold label Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mintmaker: create a cronjob for generating the OSV database #5379

Mintmaker: create a cronjob for generating the OSV database #5379

querti commented Jan 24, 2025

openshift-ci bot commented Jan 24, 2025

openshift-ci bot commented Jan 24, 2025

querti commented Jan 24, 2025

querti commented Jan 24, 2025

gbenhaim Jan 26, 2025

querti Jan 27, 2025

gbenhaim Jan 27, 2025

gbenhaim commented Jan 26, 2025 •

edited

Loading

querti commented Jan 27, 2025

gbenhaim commented Jan 27, 2025

gbenhaim commented Jan 27, 2025

querti commented Jan 27, 2025 •

edited

Loading

querti commented Jan 27, 2025

gbenhaim commented Jan 27, 2025

querti commented Jan 28, 2025

nvtnlucie Jan 28, 2025

querti Jan 28, 2025

qixiang commented Feb 5, 2025

Mintmaker: create a cronjob for generating the OSV database #5379

Are you sure you want to change the base?

Mintmaker: create a cronjob for generating the OSV database #5379

Conversation

querti commented Jan 24, 2025

openshift-ci bot commented Jan 24, 2025

openshift-ci bot commented Jan 24, 2025

querti commented Jan 24, 2025

querti commented Jan 24, 2025

gbenhaim Jan 26, 2025

Choose a reason for hiding this comment

querti Jan 27, 2025

Choose a reason for hiding this comment

gbenhaim Jan 27, 2025

Choose a reason for hiding this comment

gbenhaim commented Jan 26, 2025 • edited Loading

querti commented Jan 27, 2025

gbenhaim commented Jan 27, 2025

gbenhaim commented Jan 27, 2025

querti commented Jan 27, 2025 • edited Loading

querti commented Jan 27, 2025

gbenhaim commented Jan 27, 2025

querti commented Jan 28, 2025

nvtnlucie Jan 28, 2025

Choose a reason for hiding this comment

querti Jan 28, 2025

Choose a reason for hiding this comment

qixiang commented Feb 5, 2025

gbenhaim commented Jan 26, 2025 •

edited

Loading

querti commented Jan 27, 2025 •

edited

Loading