Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raising an error, in an ensure: block, in a background process, in a test, hangs the test framework #17813

Open
daniels220 opened this issue Feb 10, 2025 · 2 comments

Comments

@daniels220
Copy link
Contributor

daniels220 commented Feb 10, 2025

Bug description
Whew, that's a mouthful! In this one very specific set of circumstances—but they can easily arise by accident—we end up with a Debugger open even though we asked to just run the test (with errors leading to a failure).

To Reproduce
Steps to reproduce the behavior:

  1. Save the following method (it doesn't actually matter what test class it's on):
    TestCaseTest>>testBugExceptionInBackgroundEnsureHangs
     | backgroundProc |
     backgroundProc := [ [ ] ensure: [ self error: 'foo' ] ]
     	                  forkAt: Processor userBackgroundPriority.
     backgroundProc name: 'Background process'.
         "Wait for the error to actually occur in the background process, at which point the test framework will suspend it"
     (Delay forMilliseconds: 100) wait
    
  2. Run the test—not debug, so either run all tests for the class, or evaluate (TestCaseTest selector: #testBugExceptionInBackgroundEnsureHangs) run in a workspace.
  3. Observe that a debugger appears on "Error: foo". If this is part of running a large batch of tests, the run stops, waiting for the debugger to close—so in CI this would hang the build.
  4. It seems that when this debugger is closed, the background process does not always actually terminate, and also references to lots of processes remain around when they really shouldn't? This bears further investigation and is not the primary problem.

Expected behavior
No debugger, the test reports failure, and the background process is fully terminated.

Additional info
The sequence of events, as far as I can tell, goes something like:

  1. The background process raises the error. This goes initially unhandled, then the UnhandledException is caught by the handler installed in ProcessMonitorTestService>>handleNewProcess: using Process>>on:do:. This goes through ProcessMonitorTestService>>handleException: and ultimately double-dispatches its way to handleUnhandledException:, which suspends the background process and adds it to a list of failed processes.
  2. The test completes normally, including tearDown, and reaches ProcessMonitorTestService>>ensureNoBackgroundFailures (by way of TestExecutionEnvironment>>handleCompletedTest in #runTestCaseUnderWatchdog:).
  3. This signals a TestFailedByForkedProcess, since there is a background failure. This is first caught by the handler in runTestCaseUnderWatchdog:, which just makes note of it and passes, so that's not interesting.
  4. Then it is caught by the handler in TestResult>>runCase:, which makes note of the test having failed and (implicitly) ex returns, curtailing the whole test run.
  5. Unwind hits the ensure: in TestExecutionEnvironment>>runTestCase:, which ultimately sends ProcessMonitorTestService>>cleanUpAfterTest -> terminateRunningProcesses, thus sends terminate to the background process that is suspended on the error catch.
  6. But since the error was raised in an ensure: block, terminate tries to finish executing that ensure: block...and that ends up resuming in handleBackgroundException: and doing an ex pass, leading to a debugger, and blocking the main test process waiting for the termination to actually finish.

You can see the whole thing reproduced in a single workspace as follows—it is only the ensure: in the background process itself that matters, not the ones in the main process:

backgroundProc := [ [  ] ensure: [ self error: 'foo' ] ] forkAt:
	                  Processor userBackgroundPriority.
backgroundProc name: 'Background process'.
backgroundProc
	on: Error
	do: [ :ex | "Equivalent to ProcessMonitorTestService>>handleBackgroundException:"
		backgroundProc suspend.
		ex pass ].
(Delay forMilliseconds: 100) wait.
backgroundProc terminate

Expected development cost

It seems like the pass in handleBackgroundException: needs to be conditional, so the exception gets returned instead when terminating the process. And we have something that I think makes sense to use for this—shouldPassBackgroundFailures. Conditionalizing the pass does make the problem go away, so I'll toss out a PR for that.

Version information: Presumably any OS, any Pharo (tested in 11/Win and 13/macOS).

@daniels220
Copy link
Contributor Author

I should add, the example is of course highly contrived for the sake of simplicity. The real scenario I encountered was with application code that uses a background process whose body looks like, roughly, [[["we have items to process right now"] whileTrue: ["process one item, with error handling"]] ensure: ["we are done with a batch of items"]] repeat. When the code was originally written, the "done with a batch of items" code was very simple and couldn't throw an error, but it has gotten more complex and recently did throw an error, leading to the situation above. This pointed out that the error handling needed to be expanded to cover that, at which point the background process no longer throws an unhandled error and the problem goes away, but it should have gracefully failed the test rather than hanging a CI job.

@Ducasse
Copy link
Member

Ducasse commented Feb 22, 2025

Thanks for this super good bug report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants