-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: patch data race #869
fix: patch data race #869
Conversation
created the temporary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change itself LGTM. However, I added a question to understand more as I am not familiar with this part of the codebase 🐱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🫡
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I did another clear-mind review of this, and the reason why this fix does not fix anything is that all the reads are not protected by the mutex. That is, we only lock the mutex when we SetEnvironment, but not when we access it
Thanks for flagging this @Wondertan ! I'll tackle locking / unlocking the mutex whenever we get the environment. I expect this may involve introducing a helper like func GetEnvironment() *Environment {
mut.Lock()
defer mut.Unlock()
return env
} |
are we still wanting to fix this @rootulp @Wondertan |
I'm not actively working on it. Defer to @Wondertan if we still want to fix this. I assumed it's still happening in celestia-node but I just tried |
It might not happen instantly. It still happens on CI from time to time. The issue is still in the code, so we should fix this, IMO. Any long/short-term solutions work for me, and I think it would be better for you to decide which works better. |
Since this is still happening in celestia-node, I'll address #869 (comment) today as a short-term fix and create an issue to explore a long-term fix. Any objections @evan-forbes @Wondertan ? |
No, from my side |
Tagging @evan-forbes for review b/c I can't explicitly add you as a reviewer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't approve cause I originally opened this PR, but this is me approving
🤞 hoping this helps the race detector
env.genChunks = append(env.genChunks, base64.StdEncoding.EncodeToString(data[i:end])) | ||
mut.Lock() | ||
defer mut.Unlock() | ||
globalEnv.genChunks = append(globalEnv.genChunks, base64.StdEncoding.EncodeToString(data[i:end])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh no
Short term fix for celestiaorg/celestia-node#1252
Long term fix tracked in #882
Description
tendermint v0.34.x makes frequent use of a global var (
env
) for the RPC requests. If running multiple nodes, this can cause a data race. This fix is a tad hacky, since we're just adding a mutex and not fixing the source of the non-determinism, but it makes the race detector happy(ier) while also allowing for tests to pass in celestia-node.