Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful interrupts #119

Merged
merged 13 commits into from
Jul 8, 2020
Merged

Graceful interrupts #119

merged 13 commits into from
Jul 8, 2020

Conversation

Calvin-L
Copy link
Collaborator

@Calvin-L Calvin-L commented Jul 2, 2020

Fixes #110.

This change is a bit more invasive than I was hoping for, but it definitely improves responsiveness. As an added benefit, Cozy performs a graceful stop on Ctrl+C instead of discarding its progress.

Calvin-L added 7 commits July 2, 2020 11:29
The "spawn" method is already the default on Windows and will be the
default on MacOS starting with Python 3.8.  Switching to this method
required a few tweaks since SafeQueue objects can't be pickled.

There is still a lingering problem: since signal handlers are not
inherited by the child, there is a gap between when the process starts
and when the signal handler is installed where a SIGINT can cause an
exception instead of a graceful stop.
There is a gap between when a job starts and when it installs the
signal handler for SIGINT.  It is actually quite common for Cozy to
stop a job during the gap; the it happens while running the unit tests,
for instance.

The gap is an unfortunate necessity; Python does not offer a way to
atomically start a new process with a given signal handler
pre-installed without using the now-discouraged "fork" multiprocessing
method.

The gap is closed awkwardly: when a job is requested to stop, it only
sends SIGINT if it has evidence that the child has already set itself
up for receiving that signal.  In the interim period it will only set
the stop flag.  The only problem with this solution is that any Z3
calls that happen before setting up the signal handler will not be
interrupted---but fortunately, today, there aren't any.
This has a few subtle advantages:
 - The print can be captured by the test harness, so it doesn't muddy
   the test output.
 - The print is serialized with other main-thread prints, so there
   can't be any weird print-interleavings.
The solver will interrupt itself when the stop callback returns true,
as the synthesis core procedures do today.  The locations where the
callback is checked were determined empirically; we could always add
more checks later.
cozy/jobs.py Outdated Show resolved Hide resolved
cozy/jobs.py Outdated Show resolved Hide resolved
@Calvin-L
Copy link
Collaborator Author

Calvin-L commented Jul 3, 2020

GitHub isn't showing the Travis build status for some reason, but something in this change isn't working there. I'm going to set up a Linux environment to see if I can reproduce the failure.

@Calvin-L
Copy link
Collaborator Author

Calvin-L commented Jul 3, 2020

I'm able to reproduce the failure, but it happens on master as well as this branch. There's a segfault happening somewhere; dmesg reports:

[ 4969.897438] cozy[5274]: segfault at 0 ip 00007f8f339a3830 sp 00007ffdd6047460 error 4 in libffi.so.6.0.2[7f8f3399f000+5000]
[ 4969.897447] Code: ff ff 48 89 c7 48 8d 04 c1 48 8b 4d 88 01 f7 4a 89 04 e9 89 7d 84 49 83 c5 01 4c 39 6d 90 74 69 66 2e 0f 1f 84 00 00 00 00 00 <4b> 8b 3c ec 31 d2 4c 89 f1 4c 89 fe 4c 89 45 98 e8 eb f5 ff ff 4c

My current guess is that the newest version of one of Cozy's dependencies is somehow broken. Hopefully adjusting requirements.txt is all it will take to fix this...

@Calvin-L
Copy link
Collaborator Author

Calvin-L commented Jul 3, 2020

The segfaults in Travis are unrelated to this change. I have opened #120 to track progress fixing the segfaults, and I'm putting this on hold until that's done.

@Calvin-L
Copy link
Collaborator Author

Calvin-L commented Jul 7, 2020

It turns out there were segfaults happening on both master and this branch, but for different reasons. The segfaults on master are still unexplained, but the segfaults on this branch were due to misuse of multiprocessing.Value. I have fixed this branch in 237d047. Merging this branch will fix the segfaults on master: those seem due to fork() and this change moves us away from fork().

Calvin-L added 4 commits July 7, 2020 15:06
Previously the cost model would not always respond to stop requests
promptly.
Z3 reports "unknown" instead of some more helpful result when it is
interrupted.  With this change, Cozy checks if it was interrupted
during the Z3 call BEFORE it checks Z3's response.  Cozy can then throw
a more appropriate StopException if the unknown result was due to an
interruption.
@Calvin-L
Copy link
Collaborator Author

Calvin-L commented Jul 7, 2020

(@izgzhen I will merge this tomorrow unless you have additional feedback.)

@izgzhen
Copy link
Collaborator

izgzhen commented Jul 8, 2020

It looks great to me!

@Calvin-L Calvin-L merged commit 8202181 into master Jul 8, 2020
@Calvin-L Calvin-L deleted the graceful-interrupts branch July 8, 2020 05:04
Calvin-L added a commit that referenced this pull request Aug 9, 2020
Cross-process unique names have been broken since #119.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hard timeout abort
2 participants