Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests time out non-deterministically #909

Open
keyboardDrummer opened this issue Jul 17, 2024 · 6 comments
Open

Tests time out non-deterministically #909

keyboardDrummer opened this issue Jul 17, 2024 · 6 comments

Comments

@keyboardDrummer
Copy link
Collaborator

Different tests seems to fail non-deterministically:

@keyboardDrummer
Copy link
Collaborator Author

keyboardDrummer commented Jul 31, 2024

Problem started with this PR: #899 although that never got merged.

@keyboardDrummer
Copy link
Collaborator Author

keyboardDrummer commented Aug 1, 2024

The first merged PR that suffers from the problem is the following. Going back up to ten PRs before that does not show the timeout problem.
#901
Sat, 06 Jul 2024 15:38:38 GMT
Image: ubuntu-22.04
Version: 20240630.1.0

Before that was this PR:
#900
Fri, 05 Jul 2024 04:14:03
Image: ubuntu-22.04
Version: 20240630.1.0

Before that was:
#898
Wed, 03 Jul 2024 14:32:37
Image: ubuntu-22.04
Version: 20240624.1.0

However, there was also an unmerged PR that suffers from the problem:

https://github.com/boogie-org/boogie/actions/runs/9796660534/job/27051593251
Thu, 04 Jul 2024 15:17:49
Operating System
  Ubuntu
  22.04.4
  LTS
Runner Image
  Image: ubuntu-22.04
  Version: 20240630.1.0

Given that the unmerged PR is the first to run into timeouts, it seems that the issue is unrelated to any changes in Boogie. The PR before the unmerged problematic one is #898, but that itself has no issue, and neither does another PR after it, so that PR does not seem to be at fault.

The issue seems to come from the image version upgrade from 20240624.1.0 to 20240630.1.0, since all runs with the problem are in 20240630.1.0 and none are on 20240624.1.0.

@keyboardDrummer
Copy link
Collaborator Author

One thing that's a shame is that the timeout comes from lit, not from the Boogie process. It would be good to configure timeouts at the Boogie level that are shorter than the lit timeout, so that when the timeout occurs, the Boogie process can give some feedback as to where it occurred.

@keyboardDrummer
Copy link
Collaborator Author

keyboardDrummer commented Aug 2, 2024

Desperate move but maybe upgrading dotnet will help: #909

Update: nope, timeouts still occur there

@atomb
Copy link
Collaborator

atomb commented Aug 7, 2024

The issue seems to come from the image version upgrade from 20240624.1.0 to 20240630.1.0, since all runs with the problem are in 20240630.1.0 and none are on 20240624.1.0.

The diff between those Ubuntu image versions is here, but it unfortunately gives me no ideas as to what might be going wrong. I can't imagine how any of those changes would have impacted anything.

One interesting data point is that the normal runtime for the tests that are hanging is extremely short. Verification time for each is a few milliseconds, and the Z3 resource count is usually under 1000, and always under 3000.

My best hypothesis is that this is a deadlock. Maybe the changes to the underlying system are somehow enough to make the non-determinism of concurrency more likely to go wrong?

It could be worth seeing what happens if we try reverting concurrency-related PRs and seeing whether that makes any difference (not that we'd want to revert them permanently, because I think they all fix important things, but it might help us understand what's going on). The ones I see in the recent past include #841, #854, #888, and maybe #898?

@keyboardDrummer
Copy link
Collaborator Author

My best hypothesis is that this is a deadlock.

I agree. I increased the timeout by a huge amount and the timeouts still occurred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants