-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler not installed raised as an Autosubmit Critical error in the middle of the run. ( Should be an Autosubmit Error ) #2102
Comments
I think this is related to Autosubmit and not the API. Transferring the issue ➡➡➡ |
Oh, you’re right! I'm sorry, I actually thought I was writing in the AS repo. Thanks @LuiggiTenorioK |
Hello @mbatllem It is not shown as completed in the GUI/API or autosubmit monitor because the Autosubmit instance is stopped. If you haven't prompted recovery or setstatus commands yet, just doing the It also should be fine to set it to COMPLETED or even perform an The error is strange, tho. That shows up when the command ( sacct squeue... ) is not found in the remote. Maybe the platform had some weird error in which the slurm was not detected. Just resume the experiment and we'll see if it still happens. |
Also you don't need to resubmit the job as it is completed |
I'll update the issue title to Scheduler not installed raises an Autosubmit Critical in the middle of the run. I think we need to change this critical raise to only pop-up when you try to connect to the platforms, if it happens in the middle of the run, it should be an error raise so Autosubmit can reconnect to the platform. |
Thank you for your quick responses! |
Hello again, apparently this also happened here: |
Hello,
I'm not sure if this issue is related to the workflow or the AS, but I suspect it is more related to AS.
I'm using AS 4.1.11 and WF 4.2.0 on MN5.
In my experiment a1yv, the last chunk was apparently successfully COMPLETED I can see this from the model logs and also from the file:
/gpfs/scratch/ehpc01/bsc998159/a1yv/LOG_a1yv/a1yv_19900101_fc0_332_SIM_COMPLETED
.However, when I run autosubmit monitor a1yv or check the AS GUI, this same SIM job still appears as RUNNING. The experiment crashed, outputting the following in the nohup:
I would like to continue the experiment ASAP. Would it be okay if I change the job status from RUNNING to WAITING and re-submit? Or would this prevent you from investigating the root cause of the issue?
The text was updated successfully, but these errors were encountered: