Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡ discover assets in parallel #4973

Merged
merged 14 commits into from
Dec 16, 2024
Merged

⚡ discover assets in parallel #4973

merged 14 commits into from
Dec 16, 2024

Conversation

afiune
Copy link
Contributor

@afiune afiune commented Dec 11, 2024

Depends on #4970

This PR is using the new workerpool package to discover assets in parallel.

There were a few places that I detected race conditions that I have fixed, I
also added a race detector make command which is running as a CI job.

Testing

Using the github provider for testing, scanning an organization that has around 3k repositories.

Before (~15 Minutes)

TRC logger.FuncDur> func=explorer.discoverAssets took=890303.455042

After (~2 minutes)

TRC logger.FuncDur> func=explorer.discoverAssets took=124293.542166

Race Detector

You can now run make race/go to check for race conditions.

$ make race/go
go test -race go.mondoo.com/cnquery/v11/internal/workerpool
ok  	go.mondoo.com/cnquery/v11/internal/workerpool	16.417s
go test -race go.mondoo.com/cnquery/v11/explorer/scan
ok  	go.mondoo.com/cnquery/v11/explorer/scan	2.487s

This will help us submit as many requests as we want without knowing
about the workers.

Signed-off-by: Salim Afiune Maya <[email protected]>
Copy link
Contributor

github-actions bot commented Dec 11, 2024

Test Results

3 169 tests  ±0   3 168 ✅ ±0   1m 43s ⏱️ -1s
  373 suites ±0       1 💤 ±0 
   28 files   ±0       0 ❌ ±0 

Results for commit ceb2585. ± Comparison against base commit 949f0ce.

♻️ This comment has been updated with latest results.

@afiune afiune marked this pull request as draft December 11, 2024 23:01
Signed-off-by: Salim Afiune Maya <[email protected]>
@afiune afiune force-pushed the afiune/parallel_asset_discovery branch from 35b1eff to f76eda1 Compare December 12, 2024 02:07
Signed-off-by: Salim Afiune Maya <[email protected]>
@afiune afiune force-pushed the afiune/parallel_asset_discovery branch from f76eda1 to 8c98904 Compare December 12, 2024 07:28
@afiune afiune marked this pull request as ready for review December 12, 2024 14:22
@afiune afiune requested review from jaym and chris-rock December 12, 2024 15:14
@@ -198,6 +198,18 @@
"shell", "ssh", "[email protected]",
],
},
{
"name": "scan github org",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to keep this in here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think it is useful for folks testing, specially now that we have https://github.com/hit-training.

I can remove it.

@afiune
Copy link
Contributor Author

afiune commented Dec 12, 2024

You can now run make race/go

$ make race/go
go test -race go.mondoo.com/cnquery/v11/internal/workerpool
ok  	go.mondoo.com/cnquery/v11/internal/workerpool	16.417s
go test -race go.mondoo.com/cnquery/v11/explorer/scan
ok  	go.mondoo.com/cnquery/v11/explorer/scan	2.487s

I'll add a CI job for it.

Base automatically changed from afiune/workerpool+github to main December 12, 2024 17:28
@afiune afiune force-pushed the afiune/parallel_asset_discovery branch from 8364872 to e785c76 Compare December 12, 2024 17:32
We really don't need to do everything inside the workerpool, do we?

Signed-off-by: Salim Afiune Maya <[email protected]>
@afiune afiune force-pushed the afiune/parallel_asset_discovery branch from 81de208 to e3a58fd Compare December 12, 2024 18:41
@afiune afiune requested a review from imilchev December 12, 2024 19:11
@@ -92,7 +116,7 @@ func (p *Pool[R]) Close() {

// Wait waits until all tasks have been processed.
func (p *Pool[R]) Wait() {
ticker := time.NewTicker(100 * time.Millisecond)
ticker := time.NewTicker(10 * time.Millisecond)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixed the benchmark failure.

@@ -74,6 +74,7 @@ func NewGithubConnection(id uint32, asset *inventory.Asset) (*GithubConnection,
ctx := context.WithValue(context.Background(), github.SleepUntilPrimaryRateLimitResetWhenRateLimited, true)

// perform a quick call to verify the token's validity.
// @afiune do we need to validate the token for every connection? can this be a "once" operation?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed with #4980

r.Provider.Connection, r.Provider.ConnectionError = r.Provider.Instance.Plugin.Connect(req, &callbacks)
r.mu.Unlock()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is the assignation here, not the actual gRPC call Connect(). I will change this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ updated

Signed-off-by: Salim Afiune Maya <[email protected]>
Signed-off-by: Salim Afiune Maya <[email protected]>
@afiune afiune force-pushed the afiune/parallel_asset_discovery branch from 17495df to 8badda7 Compare December 15, 2024 09:41
@afiune afiune merged commit fee2739 into main Dec 16, 2024
17 checks passed
@afiune afiune deleted the afiune/parallel_asset_discovery branch December 16, 2024 09:12
@github-actions github-actions bot locked and limited conversation to collaborators Dec 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants