Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove go routines for RecheckTx #1553

Merged
merged 1 commit into from
Dec 12, 2024
Merged

Conversation

cmwaters
Copy link
Contributor

Closes: #1552

Copy link
Member

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow totally did not realize v1 was doing this

@cmwaters cmwaters self-assigned this Dec 11, 2024
@cmwaters cmwaters marked this pull request as ready for review December 11, 2024 13:32
@cmwaters cmwaters requested a review from a team as a code owner December 11, 2024 13:32
@cmwaters cmwaters requested review from staheri14 and ninabarbakadze and removed request for a team December 11, 2024 13:32
@rootulp rootulp self-requested a review December 11, 2024 14:46
@@ -713,34 +708,23 @@ func (txmp *TxMempool) recheckTransactions() {

// Issue CheckTx calls for each remaining transaction, and when all the
// rechecks are complete signal watchers that transactions may be available.
go func() {
g, start := taskgroup.New(nil).Limit(2 * runtime.NumCPU())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it would be possible to avoid this issue without blocking by simply not using a taskgroup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be watertight it should be blocking over the period that state is updating else as this is still a go routine, transactions may be submitted before updating to the latest nonce with the checktxstate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the mempool locked while the goroutine is still running?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it's not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could happen in practice that some of these CheckTx transactions happen while the mempool is locked but it's not guaranteed

return nil
})
for _, wtx := range wtxs {
wtx := wtx
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is not necessary any more b/c this repo uses Go 1.23 and Go 1.22 included:

Previously, the variables declared by a “for” loop were created once and updated by each iteration. In Go 1.22, each iteration of the loop creates new variables, to avoid accidental sharing bugs. The transition support tooling described in the proposal continues to work in the same way it did in Go 1.21.

Ref: https://tip.golang.org/doc/go1.22

Suggested change
wtx := wtx

oops disregard, I see this was just how it was previously implemented

@cmwaters cmwaters merged commit d51978e into v0.34.x-celestia Dec 12, 2024
17 checks passed
@cmwaters cmwaters deleted the cal/rechecktx branch December 12, 2024 16:53
mergify bot pushed a commit to celestiaorg/celestia-app that referenced this pull request Dec 18, 2024
rootulp added a commit that referenced this pull request Jan 8, 2025
This is residual of
#1553

The problem is now even more subtle. Because the mempool mutexes weren't
over both CheckTx and the actual adding of the transaction to the
mempool we occasionally hit a situation as follows:

- CheckTx a tx with nonce 2
- Before saving it to the store, collect all transactions and recheck
tx. This excludes the last tx with nonce 2, thus the nonce in the state
machine is still 1
- Save the tx to the pool
- New tx comes in with nonce 3. The application is at 1 so rejects it
expecting the next to be 2.

This PR fixes this pattern, however this won't be watertight for the CAT
pool until we can order both on gas fee (priority) and nonce.

---------

Co-authored-by: Rootul Patel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sequence mismatch can needlessly occur when transactions are submitted during ReCheckTx phase
3 participants