Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: patch data race #869

Merged
merged 8 commits into from
Nov 17, 2022
Merged

fix: patch data race #869

merged 8 commits into from
Nov 17, 2022

Conversation

evan-forbes
Copy link
Member

@evan-forbes evan-forbes commented Oct 10, 2022

Short term fix for celestiaorg/celestia-node#1252
Long term fix tracked in #882

Description

tendermint v0.34.x makes frequent use of a global var (env) for the RPC requests. If running multiple nodes, this can cause a data race. This fix is a tad hacky, since we're just adding a mutex and not fixing the source of the non-determinism, but it makes the race detector happy(ier) while also allowing for tests to pass in celestia-node.

@evan-forbes
Copy link
Member Author

evan-forbes commented Oct 10, 2022

created the temporary v1.3.3-tm-v0.34.20-celestia-node-patch tag with this fix added onto v1.3.3

Copy link
Member

@rach-id rach-id left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change itself LGTM. However, I added a question to understand more as I am not familiar with this part of the codebase 🐱

rpc/core/env.go Show resolved Hide resolved
rootulp
rootulp previously approved these changes Oct 11, 2022
Copy link
Member

@Wondertan Wondertan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🫡

rach-id
rach-id previously approved these changes Oct 11, 2022
Copy link
Member

@Wondertan Wondertan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I did another clear-mind review of this, and the reason why this fix does not fix anything is that all the reads are not protected by the mutex. That is, we only lock the mutex when we SetEnvironment, but not when we access it

@rootulp rootulp self-assigned this Oct 14, 2022
@rootulp
Copy link
Collaborator

rootulp commented Oct 14, 2022

Thanks for flagging this @Wondertan ! I'll tackle locking / unlocking the mutex whenever we get the environment. I expect this may involve introducing a helper like

func GetEnvironment() *Environment {
	mut.Lock()
	defer mut.Unlock()
	return env
}

@rootulp rootulp dismissed stale reviews from rach-id and themself via c569855 October 14, 2022 13:33
@rootulp
Copy link
Collaborator

rootulp commented Oct 14, 2022

@rootulp rootulp requested a review from Wondertan October 14, 2022 14:09
rpc/core/blocks.go Outdated Show resolved Hide resolved
@evan-forbes
Copy link
Member Author

are we still wanting to fix this @rootulp @Wondertan

@rootulp
Copy link
Collaborator

rootulp commented Nov 8, 2022

I'm not actively working on it. Defer to @Wondertan if we still want to fix this. I assumed it's still happening in celestia-node but I just tried go test ./... -race which didn't report any data races

@Wondertan
Copy link
Member

It might not happen instantly. It still happens on CI from time to time. The issue is still in the code, so we should fix this, IMO. Any long/short-term solutions work for me, and I think it would be better for you to decide which works better.

@rootulp
Copy link
Collaborator

rootulp commented Nov 14, 2022

Since this is still happening in celestia-node, I'll address #869 (comment) today as a short-term fix and create an issue to explore a long-term fix. Any objections @evan-forbes @Wondertan ?

@Wondertan
Copy link
Member

No, from my side

@rootulp
Copy link
Collaborator

rootulp commented Nov 14, 2022

Tagging @evan-forbes for review b/c I can't explicitly add you as a reviewer

Copy link
Member Author

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't approve cause I originally opened this PR, but this is me approving

🤞 hoping this helps the race detector

@rootulp rootulp merged commit f0b9b8d into v0.34.x-celestia Nov 17, 2022
@rootulp rootulp deleted the evan/patch-data-race branch November 17, 2022 00:02
Comment on lines -170 to +178
env.genChunks = append(env.genChunks, base64.StdEncoding.EncodeToString(data[i:end]))
mut.Lock()
defer mut.Unlock()
globalEnv.genChunks = append(globalEnv.genChunks, base64.StdEncoding.EncodeToString(data[i:end]))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh no

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants