Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#79: fix data race for blsagg AVS-44 #279

Merged
merged 1 commit into from
Jun 24, 2024

Conversation

chzyer
Copy link
Contributor

@chzyer chzyer commented Jun 21, 2024

Fixes #79.

What Changed?

==================
WARNING: DATA RACE
Write at 0x00c000189980 by goroutine 17:
  runtime.mapassign()
      /usr/local/go/src/runtime/map.go:579 +0x0
  github.com/Layr-Labs/eigensdk-go/services/bls_aggregation.(*BlsAggregatorService).singleTaskAggregatorGoroutineFunc()
      /Users/cheney/ata/eigensdk-go/services/bls_aggregation/blsagg.go:307 +0x2139
  github.com/Layr-Labs/eigensdk-go/services/bls_aggregation.(*BlsAggregatorService).InitializeNewTask.gowrap2()
      /Users/cheney/ata/eigensdk-go/services/bls_aggregation/blsagg.go:184 +0xf3

Previous read at 0x00c000189980 by goroutine 18:
  runtime.mapaccess2()
      /usr/local/go/src/runtime/map.go:457 +0x0
  github.com/Layr-Labs/eigensdk-go/services/bls_aggregation.checkIfStakeThresholdsMet()
      /Users/cheney/ata/eigensdk-go/services/bls_aggregation/blsagg.go:440 +0x116
  github.com/Layr-Labs/eigensdk-go/services/bls_aggregation.(*BlsAggregatorService).singleTaskAggregatorGoroutineFunc()
      /Users/cheney/ata/eigensdk-go/services/bls_aggregation/blsagg.go:314 +0x1464
  github.com/Layr-Labs/eigensdk-go/services/bls_aggregation.(*BlsAggregatorService).InitializeNewTask.gowrap2()
      /Users/cheney/ata/eigensdk-go/services/bls_aggregation/blsagg.go:184 +0xf3

This data race is caused by reading and writing the same map operatorsAvsStateDict[].StakePerQuorum, which is returned by avsRegistryService.GetOperatorsAvsStateAtBlock in multiple goroutines.

In the test case, the implementation of avsRegistryService is FakeAvsRegistryService, and it always returns the same map for GetOperatorsAvsStateAtBlock.

We can fix this by cloning a new map for StakePerQuorum.

In the production environment, we can’t ensure that avsRegistryService.GetOperatorsAvsStateAtBlock always returns a new map that is safe to modify since avsRegistryService is an interface. For example, we have implemented a cache wrapper for avsRegistryService.

In conclusion, I think it’s worth cloning the StakePerQuorum map for safety since the overhead is low.

Reviewer Checklist

  • Code is well-documented
  • Code adheres to Go naming conventions
  • Code deprecates any old functionality before removing it

@chzyer chzyer mentioned this pull request Jun 21, 2024
3 tasks
Copy link
Collaborator

@samlaf samlaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful, thank you so much.
Another instance where rust's borrow checker would have prevent this.
I banged my head around this for a while, and didn't think that the fake implementation could be the culprit. Thanks so much!
Just need to rebase and then we can merge

@samlaf samlaf changed the title #79: fix data race for blsagg #79: fix data race for blsagg AVS-44 Jun 23, 2024
@chzyer chzyer force-pushed the fix/blsagg_race_condition_79 branch from ba41537 to dc5444c Compare June 24, 2024 01:35
@chzyer
Copy link
Contributor Author

chzyer commented Jun 24, 2024

Thanks @samlaf, rebased!

@samlaf samlaf merged commit 68ca900 into Layr-Labs:dev Jun 24, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data race
2 participants