Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autosubmit run doesn't exit, leaving the process sleeping, childless #2112

Open
kinow opened this issue Feb 6, 2025 · 2 comments
Open

autosubmit run doesn't exit, leaving the process sleeping, childless #2112

kinow opened this issue Feb 6, 2025 · 2 comments
Milestone

Comments

@kinow
Copy link
Member

kinow commented Feb 6, 2025

Documenting an issue that happened with t01s and t01x in the TS in ClimateDT.

You run an autosubmit run, and at some point the main process goes to sleep... the children become zombies... and then the children (log retrieval processes) stop retrieving logs, they die, and the parent autosubmit run keeps running, and never responds to autosubmit stop (#2104 #2063 ).

Creating this issue just so we have it recorded here for posterity.

We thought #2041 could fix it, but that didn't.

Now #2097 could help with it, but not sure if it will fix.

@kinow kinow added this to the 4.1.12 milestone Feb 6, 2025
@kinow
Copy link
Member Author

kinow commented Feb 6, 2025

4.1.12 if we have time, otherwise 4.1.13 as this doesn't break the simulation (but is definitely a very annoying 🐛 ).

@kinow
Copy link
Member Author

kinow commented Feb 14, 2025

Yesterday, talking with @franra9, he reported that this is happening quite frequently for one of his experiments (more so than it happened to me).

He also said something interesting. He had already run the experiment, and then had to setstatus and relaunch it, then the issue happened. Or sometimes he could be launching from start, but that would be after he had once launched the experiment.

So maybe Autosubmit stops in the first execution leaving data (on disk, pickles, DB, etc.) in an inconsistent format, and then the next time Autosubmit runs it is able to read the config, load, but eventually it gets into this inconsistent state and stalls.

Or it could be something else… but given we don't really have a clue where to start looking for this issue, Francesc's comment seemed like a good starting point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant