-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mintmaker: create a cronjob for generating the OSV database #5379
base: main
Are you sure you want to change the base?
Conversation
The database will be in a persistent volume which will be mounted by pods running renovate. The cronjob downloads the upstream OSV database and generates custom RPM and docker DB files on each run. The job uses the same image as MintMaker controller, so no modifications to the release process are necessary. The reason why the cronjob was duplicated as a job is that the schedule "every X hours" does not create a job upon cronjob creation. This can cause issues when bootstrapping, since no DB will exist for several hours. The job is implemented as a hook, and the resource will self-delete after its run.
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: querti The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
WARNING: this cronjob will not work until the corresponding mintmaker change has been merged (konflux-ci/mintmaker#163) and a new image has been built and changed in this repo. |
Related PRs: konflux-ci/mintmaker#163 and redhat-exd-rebuilds/renovate#8 TODO of what else is needed to make this functionality work:
|
@@ -0,0 +1,12 @@ | |||
apiVersion: v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default storage class provides EBS volumes. The pvc is going to be bounded to a single node, it means that all the jobs that will use it would need to be scheduled on the same node, this can cause a bottleneck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Checking the storage classes, it seems all of them use EBS volumes. Is there a way to overcome this limitation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not at this moment.
what is going to prevent the from updating the files in the pvc while other pods are using it? |
I haven't considered this possibility, but checking the docs of the package which will use these files (https://github.com/seald/nedb), it claims "A copy of the whole database is kept in memory". I assume it will load the file to memory and perform the queries on the in-memory version. So maybe this won't be much of an issue. Regardless, is there a simple way to overcome this problem? |
I suggest to build a container image with the data base and then pull in the job that need it. |
what is the size of that db? |
We haven't decided how much historical data to include, but I assume up to ~200MB |
The issue is that we want to refresh the data every few hours. I'm not sure we'd want to rebuild our image periodically like that. |
why not? what is the issue with that? |
At the moment, we only rebuild the image when releasing a new version of Mintmaker. We would need to rethink our release process, so the simplest solution is preferable. Would you be OK with us pursuing the current solution, and only start exploring different approaches if the overwriting of data becomes an observable issue? |
command: | ||
- /bin/sh | ||
- -c | ||
- /osv-generator -destination-dir /mnt/database -docker-filename docker.nedb -rpm-filename rpm.nedb -days 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check how many data will be generated with days=1 (which can be done after Feb 10 when released.csv is available), but I would consider to increase this to 3 possibly, so we can still fix vulnerabilities in case of an outage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a placeholder value, I expect that we will gather more than just one day's worth of data. But for testing purposes, changes.csv has a lot of entries even for 1 day, so it will generate quite sizable databases.
/hold Adding the do-not-merge label to prevent accidental merging, as this change is not ready for merging. |
The database will be in a persistent volume which will be mounted by pods running renovate. The cronjob downloads the upstream OSV database and generates custom RPM and docker DB files on each run. The job uses the same image as MintMaker controller, so no modifications to the release process are necessary.
The reason why the cronjob was duplicated as a job is that the schedule "every X hours" does not create a job upon cronjob creation. This can cause issues when bootstrapping, since no DB will exist for several hours. The job is implemented as a hook, and the resource will self-delete after its run.