-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Remove code to reinitialise random numbers on fork() #2409
Conversation
Signed-off-by: Gavin Halliday <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2409 +/- ##
==========================================
+ Coverage 87.06% 87.07% +0.02%
==========================================
Files 199 198 -1
Lines 6079 6077 -2
==========================================
- Hits 5292 5291 -1
+ Misses 787 786 -1
|
I am not sure if we should remove code to reseed the random number on fork(). The random number generator is used for generating trace_id/span_id. If the RNG state is duplicated across forked processes, it could affect the randomness of the sampling process, potentially leading to biased sampling results. Thinking out loud, should we have a thread local flag to indicate if a fork has occurred, and set that in fork handler. Something like this: class TlsRandomNumberGenerator {
....
public: static FastRandomNumberGenerator & engine() noexcept {
if (forked_) {
Seed();
forked_ = false;
}
return engine_;
}
private:
static thread_local bool forked_;
static void OnFork() noexcept {
forked_ = true;
}
} This will add an extra conditional check for each span creation, so that need to be considered too, specifically when most of the applications mayn't be doing any fork. |
Not directly related to this PR, but it is also possible to plugin a custom random-id generator if the existing one doesn't suffice -
|
Once a process has been forked it is legal to do very little until it calls exec() or similar to switch to a different process. I don't think it would be legal to call any of the open telemetry functions. See the documentation in https://man7.org/linux/man-pages/man3/pthread_atfork.3.htmlin the linked issue. I know this change was made a while back in response to an issue, but I could not work out from the report what the application was aiming to do in the forked() child process. I suspect the code in the application should have been changed rather than making this change here. |
The code I shared is only setting a flag in the atfork hook, and actual reseeding is deferred. We can change it to an atomic flag to be thread-safe. Do you think this is not safe/legal?
Do you mean changes done for this issue - opentracing-contrib/nginx-opentracing#52? |
That change would also avoid the problem - both the time taken and the code - but I cannot think of any reason why any processing is required for the fork. It also adds an impact as you say to each request.
I have looked at ingress-nginx, and can't find the code that calls fork - so I can't easily see the issue the fix was trying to address. @rnburn are you able to clarify what the issue was, and how this was causing problems with a fork? |
If
|
It will make generator to generate same random sequence between processes. |
That is true, but what is that process calling? My point is that the child process is highly restricted in what it can do at that point (see the linux references) - calling into the open telemetry libraries is almost certainly too much (calling any heap management functions can cause memory corruptions). If the child process needs an id, then it should be allocated before calling fork(). As soon as exec() is called the new child process will reinitialise the random number generators and the numbers will be different. |
I believe this PR isn't ready for merging due to the above mentioned concerns. Additionally, the updates suggested in this PR could be implemented through a custom random number generator. Therefore, closing this PR, but feel free to reopen it with the required modifications. |
Fixes #2408
Changes
Remove the AtFork hook in the random number generator which potentially causes memory corruptions (as well as being inefficient and unnecessary).
I realise the regression test case will need deleting, but have left that until the change is agreed in principle.
CHANGELOG.md
updated for non-trivial changes