-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout when jobs are waiting #8
Comments
Hmm… the time should not "count" when a job is waiting to enter a stage. Timeouts are only on individual command executions, not on entire stages, so jobs can wait indefinitely until it's their turn. The weirdest thing is this command timing out:
It might be worth manually running that setup script a couple times to see if it waits for something to happen at any point. Not that this solves the mystery, but maybe there's a more direct way to obtain |
From the command, it seems like the command was only given 60s before timing out? |
Ah! I see now all the process have their own timeout! I will try running the script multiple times! I will also try increase the timeout for the specific process that are timing out!! |
adding on this: |
Oh no, that's bad! I really don't know why that would be happening… it really seems like the timeout should work every time. 😱 I don't exactly know what's going wrong and we'll have to dig deeper. But I do notice that we're not cleaning up timed-out commands, as we probably should be according to the Python docs. But that probably couldn't cause what we're seeing here? |
This was fixed. The issue was make stage would block on the license acquired by a different make stage on the same machine. With the new AWS deployment, this is not a problem. |
When I run 4 hw jobs in parallel, half of them end up getting timeout errors. I suspected that this is because generating AFI stage cannot be run in parallel and the waiting time added up to reach the timeout for AFI stage.
In these two, timeout happens in the beginning of AFI stage
http://ec2-54-234-195-6.compute-1.amazonaws.com:5000/jobs/2sfeH7kJqyg.html
http://ec2-54-234-195-6.compute-1.amazonaws.com:5000/jobs/w57woT8LUMs.html
However, some of the jobs timeout in the beginning, or middle of make stages
http://ec2-54-234-195-6.compute-1.amazonaws.com:5000/jobs/wpc3A_L4UDI.html
http://ec2-54-234-195-6.compute-1.amazonaws.com:5000/jobs/fjMiNTK1b2E/log.txt
And the synthesis timeout is already 20000. I also suspect when jobs are waiting for entering make stage, the waiting time counts towards timeout?
The text was updated successfully, but these errors were encountered: