Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected return values from pthread_cond_init_232, pthread_cond_destroy_232, pthread_cond_timedwait #464

Open
Rob--W opened this issue Mar 12, 2024 · 2 comments

Comments

@Rob--W
Copy link
Contributor

Rob--W commented Mar 12, 2024

Even after patching #130 (with #463), I am unable to launch Firefox 123.0 because it crashes.
After bisecting libfaketime, I found that the regression is caused by #422. This patch replaced a blocking call with one that does not block in an attempt to address #419. I think that this behavior should be opt-in (at compile time or at runtime), to minimize the risk of triggering unexpected failures at the next libfaketime release.

The Firefox crashes happen because it does not expect these functions to fail:

These functions should ideally not fail, or at the very least libfaketime should try harder to avoid failing too soon. Ordinary programs usually expect these to not fail, and that is also a sentiment echoed through the pthreads documentation:

pthread_cond_init, pthread_cond_signal, pthread_cond_broadcast, and pthread_cond_wait never return an error code.
-- https://man.archlinux.org/man/core/man-pages/pthread_cond_init.3.en

oddly, a different source specifies an error as an acceptable result:

The pthread_cond_init() function shall fail if:
EAGAIN The system lacked the necessary resources (other than memory) to initialize another condition variable.
-- https://www.man7.org/linux/man-pages/man3/pthread_cond_init.3p.html

pthread_cond_timedwait_common in libfaketime can return EAGAIN, but that error is not listed in the pthreads manual: https://www.man7.org/linux/man-pages/man3/pthread_cond_timedwait.3p.html

@wolfcw
Copy link
Owner

wolfcw commented Mar 19, 2024

I agree that ordinary programs do not expect any of these calls to fail - libfaketime itself bailed out the hard way before #422, so that's arguably at least some progress. However, EAGAIN apparently is not a good choice for timedwait_common() as it's not documented; on the other hand, none of the documented errors fit here either. Basically, "try it again" would be a quite suitable error message if it'd exist. :-)

Do you have any specific suggestions on making libfaketime "try harder" in these cases? #419 discussed re-trying in a loop like 1,000 times, but as usual with brute force approaches to non-determinism issues, this does not sound very convincing.

@Rob--W
Copy link
Contributor Author

Rob--W commented Mar 20, 2024

I would recommend blocking indefinitely like before, or with a long delay (optionally followed by printing an error message), followed by a long or indefinite wait. I don't expect a very long delay in practice, unless there is something such as sync IPVC or I/O blocking the process for some reason.

And then to support the use case that #422 was meant to address, add a compile-time flag, or a runtime environment variable to allow configuration of the behavior.

You expressed concerns about retrying for an arbitrary amount of time, but that exact concern would apply equally to programs that call these library functions. Because programs don't really have a good way to deal with this, I recommend #422 to be opt-in instead of opt-out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants