Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🧹 batch sync assets #995

Merged
merged 3 commits into from
Dec 13, 2023
Merged

🧹 batch sync assets #995

merged 3 commits into from
Dec 13, 2023

Conversation

imilchev
Copy link
Member

Instead of calling SynchronizeAssets once for all discovered assets, we create batches. This makes sure that if we have discovered a huge amount of assets, the call would still succeed.

I intentionally put a low number for the batch size because with this implementation, batches are scanned in parallel, which also decreases the total scan time.

Downside of this approach is that the progress bar results ordering is no longer deterministic. However, since this is a progress bar, I don't think ordering should matter

Signed-off-by: Ivan Milchev <[email protected]>
@imilchev imilchev force-pushed the ivan/batch-sync-assets branch from 7be5579 to fe16ac1 Compare December 12, 2023 09:17
@imilchev imilchev linked an issue Dec 12, 2023 that may be closed by this pull request
Signed-off-by: Ivan Milchev <[email protected]>
@jaym
Copy link
Contributor

jaym commented Dec 12, 2023

cnquery/cnspec and its providers are in general not thread safe and doing this will trigger data races. Not sure how to get this information for the providers, but you can at least see the data races fairly quickly for cnspec:

go run -race ./apps/cnspec/cnspec.go scan k8s

@imilchev
Copy link
Member Author

it looks like those are related to the BufferedCollector and ranger somehow. I am not entirely sure how to read those. I will not execute anything in parallel here. Will make the whole PR synchronous but still working with batches so we can unblock the fix for the original issue

Signed-off-by: Ivan Milchev <[email protected]>
@imilchev imilchev marked this pull request as ready for review December 13, 2023 09:08
@imilchev
Copy link
Member Author

The scans are now synchronous, batch size is 100

Copy link
Contributor

@czunker czunker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks @imilchev

@czunker
Copy link
Contributor

czunker commented Dec 13, 2023

I gave this a try with a container registry scan. Works as expected.

@imilchev imilchev merged commit 306c1ae into main Dec 13, 2023
9 checks passed
@imilchev imilchev deleted the ivan/batch-sync-assets branch December 13, 2023 16:23
@github-actions github-actions bot locked and limited conversation to collaborators Dec 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Larger K8s assets timeout issues
3 participants