Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: introduce mutex for state and lastCommitInfo to avoid race conditions #22692

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

beer-1
Copy link
Contributor

@beer-1 beer-1 commented Nov 29, 2024

Description

Closes: #22650


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title, you can find examples of the prefixes below:
  • confirmed ! in the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • reviewed "Files changed" and left comments if necessary
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • updated the relevant documentation or specification, including comments for documenting Go code
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

Please see Pull Request Reviewer section in the contributing guide for more information on how to review a pull request.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic, API design and naming, documentation is accurate, tests and test coverage

Summary by CodeRabbit

  • New Features

    • Support for simulating nested messages added.
    • New Linux-only backend for the crypto/keyring module introduced.
    • Enhanced client/keys module to import hex keys via standard input.
  • Improvements

    • Upgraded RocksDB libraries to support version 9.
    • Simplified integration tests by removing double context usage.
  • Bug Fixes

    • Mutex lock added to prevent race conditions in the base application.
    • Simulation tests now skip when validators are dry, ensuring smoother execution.

Copy link
Contributor

coderabbitai bot commented Nov 29, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The pull request introduces several enhancements and bug fixes across multiple modules within the Cosmos SDK. Key changes include the addition of features for simulating nested messages, a new Linux-only backend for the crypto/keyring module, and improvements to integration tests. Concurrency control is enhanced through the introduction of mutex locks to prevent race conditions in state management. The changelog has been updated to reflect these changes, and new test cases have been added to ensure the stability of the application under concurrent conditions.

Changes

File Change Summary
CHANGELOG.md Updated with entries for new features, improvements, and bug fixes.
baseapp/abci.go Refactored state management methods to enhance encapsulation and dynamic state retrieval.
baseapp/abci_test.go Added TestABCI_Race_Commit_Query to test concurrency during block finalization and query context creation.
baseapp/baseapp.go Introduced stateMut mutex for thread-safe state management.
baseapp/test_helpers.go Updated state access methods to use dynamic state retrieval based on execution mode.
go.mod Added replace directive for cosmossdk.io/store to point to a local path.
store/rootmulti/store.go Added lastCommitInfoMut mutex for thread-safe access to commit information in the Store struct.

Assessment against linked issues

Objective Addressed Explanation
Address race condition between baseapp.Commit and baseapp.CreateQueryContext (#22650)

Possibly related PRs

Suggested labels

C:server/v2, C:server/v2 cometbft, backport/v0.52.x

Suggested reviewers

  • kocubinski
  • facundomedica
  • testinginprod
  • sontrinh16
  • tac0turtle

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@beer-1 beer-1 changed the title fix: introduce mutex for state and lastCommitInfo to avoid race fix: introduce mutex for state and lastCommitInfo to avoid race conditions Nov 29, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (13)
go.mod (1)

218-219: Consider documenting the temporary nature of this replace directive.

Since this replace directive is part of the race condition fix, consider moving it to the temporary replace section with a comment explaining its purpose.

 // Here are the short-lived replace from the Cosmos SDK
 // Replace here are pending PRs, or version to be tagged
-// replace (
-// 	<temporary replace>
-// )
+replace (
+    // Temporary: Using local store module for race condition fix (#22650)
+    cosmossdk.io/store => ./store
+)
CHANGELOG.md (4)

Line range hint 1-1: Add version table at the top of changelog

Consider adding a version table at the top that summarizes all v0.46.x releases with their dates and key changes for easier reference.


Line range hint 11-13: Fix inconsistent bullet point formatting

The bullet points use a mix of * and -. Standardize on one format throughout the document for consistency.


59-61: Add impact details for breaking changes

For breaking changes like the Dragonberry security fix, consider adding more details about potential impact on users and required actions.


Line range hint 1-1000: Fix typos and grammatical errors

Several typos and grammatical errors found throughout the document:

  • "typographical" misspelled as "typograhical"
  • Missing periods at end of some bullet points
  • Inconsistent capitalization in section headers
store/rootmulti/store.go (1)

63-63: Naming convention for mutex variables.

Consider renaming lastCommitInfoMut to lastCommitInfoMu to align with Go conventions, where mutex variables are typically suffixed with Mu.

baseapp/baseapp.go (2)

127-127: Consider renaming stateMut to stateMutex for clarity.

According to the Uber Go Style Guide, abbreviations in variable names should be avoided to enhance readability. Renaming stateMut to stateMutex makes the purpose of the mutex clearer.

Apply this diff to improve clarity:

-	stateMut             sync.RWMutex
+	stateMutex           sync.RWMutex

Line range hint 583-586: Potential deadlock due to inconsistent lock ordering between app.mu and app.stateMut.

In getContextForTx, app.mu is locked before calling app.getState(mode), which in turn locks app.stateMut. If elsewhere app.stateMut is locked before attempting to acquire app.mu, this could lead to a deadlock due to inconsistent lock ordering.

Recommend reviewing the locking order to ensure that app.mu and app.stateMut are always acquired in a consistent order throughout the codebase to prevent potential deadlocks.

baseapp/abci.go (3)

609-609: Handle possible errors from CacheContext

In the assignments:

ctx, _ = app.getState(execModeFinalize).Context().CacheContext()

Ignoring the second return value without handling could overlook potential issues. Although CacheContext() may not currently return an error, future changes could introduce errors.

Consider capturing both return values explicitly for clarity:

ctx, ms := app.getState(execModeFinalize).Context().CacheContext()
// use ms if needed

Also applies to: 689-689


747-747: Clarify the comment for better understanding

The comment at line 747 is unclear:

// only used to handle early cancellation, for anything related to state app.getState(execModeFinalize).Context()

Consider rephrasing for clarity:

// Use ctx only for early cancellation handling. For state-related operations, use app.getState(execModeFinalize).Context()

1194-1194: Review the use of CacheContext

In:

ctx, _ = app.getState(execModeFinalize).Context().CacheContext()

Ensure that caching the context here is intended and that any modifications do not affect the main context unintentionally.

baseapp/abci_test.go (2)

2783-2830: Add a function comment to describe the purpose of the test

To improve code readability and maintainability, consider adding a comment before the TestABCI_Race_Commit_Query function to explain its purpose and what it is testing.


2796-2797: Simplify atomic counter initialization

Since the zero value of atomic.Uint64 is zero, the explicit call to counter.Store(0) is unnecessary. You can simplify the code by removing this call.

Apply the following diff:

-counter := atomic.Uint64{}
-counter.Store(0)
+var counter atomic.Uint64
📜 Review details

Configuration used: .coderabbit.yml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6cfe2dc and 298d02e.

📒 Files selected for processing (7)
  • CHANGELOG.md (1 hunks)
  • baseapp/abci.go (21 hunks)
  • baseapp/abci_test.go (2 hunks)
  • baseapp/baseapp.go (7 hunks)
  • baseapp/test_helpers.go (1 hunks)
  • go.mod (1 hunks)
  • store/rootmulti/store.go (5 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
CHANGELOG.md (1)

Pattern **/*.md: "Assess the documentation for misspellings, grammatical errors, missing documentation and correctness"

baseapp/abci.go (1)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.

baseapp/abci_test.go (2)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.


Pattern **/*_test.go: "Assess the unit test code assessing sufficient code coverage for the changes associated in the pull request"

baseapp/baseapp.go (1)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.

baseapp/test_helpers.go (1)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.

store/rootmulti/store.go (1)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.

🔇 Additional comments (20)
baseapp/test_helpers.go (2)

47-47: LGTM: Thread-safe state access implemented

The change from direct state access to using getState aligns with the PR's objective of preventing race conditions. This modification ensures thread-safe access to the state during simulations.

Let's verify the thread safety implementation:

✅ Verification successful

Thread-safe state access confirmed via RLock/RUnlock

The verification confirms that getState method properly implements thread-safety using RLock/RUnlock from stateMut. The implementation correctly protects state access across different execution modes, including the execModeFinalize used in the test helper.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that getState method uses proper synchronization
# Expected: Find mutex usage in getState implementation

ast-grep --pattern 'func (app *BaseApp) getState($_) $_ {
  $$$
}'

Length of output: 831


53-58: Thread-safe state access implemented, verify performance impact

The changes correctly implement thread-safe state access using getState for both check and finalize modes. This aligns with fixing the race conditions identified in issue #22650.

Let's verify the usage patterns and potential performance impact:

✅ Verification successful

Based on the search results, I can now provide a final response since I have enough information about the usage patterns of NewContextLegacy:

Thread safety changes are safe with minimal performance impact

The changes to NewContextLegacy using getState are safe because:

  • The method is primarily used in test code and initialization paths, not in hot execution paths
  • Main usage patterns are in:
    • Test helpers and test suites
    • App initialization and genesis setup
    • Export functionality
    • Simulation framework

The added synchronization through getState won't cause performance issues since this method isn't called in performance-critical paths or high-frequency operations.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for high-frequency calls to NewContextLegacy that might be affected by the added synchronization
# Expected: Identify critical paths that might need optimization

# Find direct calls to NewContextLegacy
rg "NewContextLegacy\(" -A 2

# Find potential high-frequency callers through NewContext
rg "NewContext\(" -A 2

Length of output: 40662

go.mod (1)

219-219: LGTM! Verify version consistency with other local modules.

The addition of the store module replace directive aligns with the existing pattern and is necessary for implementing the race condition fix. However, we should ensure version consistency.

✅ Verification successful

Let me verify the version consistency further by checking other local module replacements.


Version consistency verified across local module replacements

The replace directive for the store module follows the same pattern as other local module replacements (api, bank, staking, tx). All local modules use Go 1.23, and the version specified in the main go.mod (v1.1.1-0.20240418092142-896cdf1971bc) is consistent with the development nature of these changes.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check version consistency between go.mod and local store module
# Verify the store module's go.mod version matches the version specified in the main go.mod

# Get the version from main go.mod
echo "Main module store version:"
grep "cosmossdk.io/store" go.mod | head -n 1

# Get the version from local store module
echo "Local store module version:"
if [ -f "store/go.mod" ]; then
    grep "module cosmossdk.io/store" store/go.mod
fi

Length of output: 329


Script:

#!/bin/bash
# Check all local module replacements and their versions
echo "Main go.mod versions:"
grep "cosmossdk.io/" go.mod | grep -v "replace" | sort

echo -e "\nReplace directives:"
grep "replace cosmossdk.io/" go.mod | sort

echo -e "\nLocal module versions:"
for dir in api bank staking tx store; do
    if [ -f "$dir/go.mod" ]; then
        echo -e "\n$dir/go.mod:"
        grep "^module" "$dir/go.mod"
        grep "^go 1." "$dir/go.mod"
    fi
done

Length of output: 1573

store/rootmulti/store.go (7)

63-63: Introduction of lastCommitInfoMut for concurrency control.

Adding lastCommitInfoMut sync.RWMutex to the Store struct appropriately introduces a mutex to prevent race conditions when accessing lastCommitInfo.


292-294: Proper locking around lastCommitInfo assignment in loadVersion().

Locking lastCommitInfoMut before updating rs.lastCommitInfo ensures thread-safe write operations.


440-446: Consistent use of LastCommitInfo() in LatestVersion().

Updating LatestVersion() to use rs.LastCommitInfo() ensures safe concurrent access to lastCommitInfo.


448-451: Thread-safe access to lastCommitInfo via LastCommitInfo() method.

The new LastCommitInfo() method correctly implements read locking to allow safe concurrent reads of lastCommitInfo.


456-474: Safe access to lastCommitInfo in LastCommitID().

Using rs.LastCommitInfo() in LastCommitID() prevents data races during read operations of lastCommitInfo.


518-521: Ensure minimal locking when updating lastCommitInfo.Timestamp.

Holding the lock only during the assignment to lastCommitInfo.Timestamp is acceptable. Confirm that no other operations within the lock could cause delays or potential deadlocks.


800-802: Thread-safe retrieval of lastCommitInfo in Query().

Using rs.LastCommitInfo() in the Query() method ensures safe concurrent access and prevents race conditions.

baseapp/abci.go (9)

72-72: Good use of encapsulation with app.getState

The introduction of app.getState(execModeFinalize) enhances state management by encapsulating state retrieval, improving code maintainability and readability.


78-78: Ensure req.ConsensusParams is validated

While storing consensus parameters, make sure that req.ConsensusParams is not nil and contains valid data to prevent potential runtime errors.


90-97: Consistent Context Update for States

Updating both checkState and finalizeState contexts ensures consistency across different execution modes. This is a good practice for maintaining state integrity.


110-110: Initialization of Infinite Gas Meter

Setting an infinite gas meter during InitChain is appropriate to allow unrestricted operations during genesis. Ensure that this does not inadvertently persist into states where gas limits should be enforced.


936-939: Ensure proper mutex usage when modifying shared state

Locking app.stateMut before modifying app.finalizeBlockState is correct. Ensure that all other accesses to shared state variables are similarly protected to prevent race conditions.


1024-1026: Mutex locking and unlocking

Proper use of app.stateMut.Lock() and app.stateMut.Unlock() ensures thread-safe operations when modifying shared states.


1047-1047: Ensure ms.Write() is safe and necessary

Calling ms.Write() writes pending changes to the parent store. Confirm that this call is necessary and that any potential errors are handled appropriately.


1409-1409: Verify consensus parameters retrieval

Retrieving consensus parameters using app.getState(execModeFinalize).Context() should be thread-safe and consistent with the application's state at this point.


978-979: ⚠️ Potential issue

Accessing finalizeState without synchronization

Accessing finalizeState here may lead to race conditions if not properly synchronized. Verify that app.stateMut is held during this access.

Run the following script to identify unsynchronized accesses to app.getState:

Also applies to: 983-983

baseapp/abci_test.go (1)

2796-2797: Verify Go version compatibility for atomic.Uint64

The atomic.Uint64 type is available from Go 1.19 onwards. Ensure that the project specifies Go 1.19 or newer in its configuration to avoid compatibility issues.

Run the following script to check the Go version specified in the go.mod file:

✅ Verification successful

Go version compatibility verified for atomic.Uint64

The project uses Go 1.23.2, which fully supports atomic.Uint64 (introduced in Go 1.19). No compatibility issues exist.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check that the project requires Go 1.19 or newer.

# Look for the 'go' directive in the go.mod file
grep "^go " go.mod

Length of output: 30

store/rootmulti/store.go Outdated Show resolved Hide resolved
baseapp/baseapp.go Show resolved Hide resolved
baseapp/baseapp.go Show resolved Hide resolved
baseapp/abci.go Show resolved Hide resolved
baseapp/abci.go Show resolved Hide resolved
baseapp/abci.go Show resolved Hide resolved
baseapp/abci_test.go Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
store/rootmulti/store.go (1)

514-522: Consider combining the mutex locks.

The code currently acquires and releases the mutex twice in quick succession. Consider combining these operations to reduce lock overhead:

-       rs.lastCommitInfoMut.Lock()
-       rs.lastCommitInfo = cInfo
-       rs.lastCommitInfoMut.Unlock()
-
-       rs.lastCommitInfoMut.Lock()
-       rs.lastCommitInfo.Timestamp = rs.commitHeader.Time
-       rs.lastCommitInfoMut.Unlock()
+       rs.lastCommitInfoMut.Lock()
+       rs.lastCommitInfo = cInfo
+       rs.lastCommitInfo.Timestamp = rs.commitHeader.Time
+       rs.lastCommitInfoMut.Unlock()
📜 Review details

Configuration used: .coderabbit.yml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 298d02e and aa5960a.

📒 Files selected for processing (1)
  • store/rootmulti/store.go (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
store/rootmulti/store.go (1)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.

🔇 Additional comments (4)
store/rootmulti/store.go (4)

63-63: LGTM: Good choice of RWMutex for lastCommitInfo synchronization.

Using sync.RWMutex is appropriate here as it allows concurrent reads while ensuring exclusive writes, which aligns with the access patterns of lastCommitInfo.


292-294: LGTM: Appropriate lock scope in LoadVersion.

The mutex lock is correctly scoped to only protect the critical section (lastCommitInfo assignment), minimizing lock contention.


448-451: LGTM: Well-implemented thread-safe accessor.

The method correctly uses RLock/RUnlock for concurrent read access to lastCommitInfo. The defer statement ensures proper unlock even in case of panics.


440-446: LGTM: Consistent usage of thread-safe accessor.

All methods consistently use the thread-safe LastCommitInfo accessor, properly addressing the race condition between Commit and CreateQueryContext operations.

Also applies to: 456-474, 802-804

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: race condition between baseapp.Commit and baseapp.CreateQueryContext
1 participant