Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job submission fails based on job name #112

Open
utf opened this issue May 8, 2024 · 3 comments
Open

Job submission fails based on job name #112

utf opened this issue May 8, 2024 · 3 comments

Comments

@utf
Copy link
Collaborator

utf commented May 8, 2024

Trying to run the elastic workflow in atomate2 using jobflow-remote, the elastic relax jobs fail upon submission because the job name contains the character "/". E.g., elastic relax 1/12.

The error from qsub is:

qsub: illegal -N value
@gpetretto
Copy link
Contributor

gpetretto commented May 9, 2024

Thanks for reporting this. This usually works with slurm and, given that you mention qsub, I suppose you are working with PBS. And indeed it seems that PBS does not like / in the name of the job.
Fireworks make a stricter sanitization of the job name: https://github.com/materialsproject/fireworks/blob/a5ec954695fbd29accda93431e95e4c2e847fd9b/fireworks/utilities/fw_utilities.py#L187. However, from a quick test, PBS does not like ( and ) as well, that are instead allowed in fireworks.
I thought that it would be nice to have the name of the Job in the queue, but based on this I am wondering if it worth the risk of errors like this coming up with sanitization. Maybe using f"JF_{db_id}" as a name for the job in the queue would be a better choice? Or even JF_job? What do you think?

(In case you need a quick fix to keep running, you can modify this portion of the code locally to sanitize the name: https://github.com/Matgenix/jobflow-remote/blob/develop/src/jobflow_remote/remote/queue.py#L15)

@utf
Copy link
Collaborator Author

utf commented May 9, 2024

I also like having the name of the job in the queue. I wonder if the sanitation just extracts all alphanumeric characters only? This could potentially be achieved with a regular expression?

Thanks for the hint about keeping things running. I ended up just modifying the workflow in atomate2 locally, but this would have indeed been nicer.

@gpetretto
Copy link
Contributor

Only alphanumeric characters, plus maybe _ would be an option. In this case you would get something like elastic_relax_112, So maybe replacing any charater which is not alphanumeric with _ would make it more readable. This would probably be better looking than JF_2571. However, an additional point that I have seen is handled in FW and not in jobflow-remote is the maximum length of the name: https://github.com/materialsproject/fireworks/blob/a5ec954695fbd29accda93431e95e4c2e847fd9b/fireworks/queue/queue_launcher.py#L98
I suppose this could be a configuration option (as in FW), but not sure if it worth having it.

So, summarizing I think the options could be:

  • sanitized job name (only alphanumeric + _. All other characters either removed or converted to _), plus an option for max length (or a reasonably low constant number?)
  • f"JF_{db_id}". I suppose this would be fine up to billions of jobs without the need to limit the name length.
  • JF_job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants