You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're seeing some issues with QCG-PilotJob on very large machines, and in general given the heterogeneity of the HPC landscape it's probably good to have more than one way of starting and monitoring processes to maximise our chances of success.
We've been working with the authors of RADICAL-Pilot lately, and they are adding some features to it to make it more suitable for use as an instantiator for MUSCLE3. Let's add it as an optional second backend.
Add an integration test that uses a simulated SLURM cluster of Docker containers
Add an RPInstantiator
Scan the environment and determines what resources we have
Get an RP Pilot using those
Start instances using the pilot
Monitor execution and shut down correctly at simulation end or crash
Add a command line option to the manager to select the backend
Test (locally, DAS, ARCHER2, Frontier?)
Coordinate a release
The text was updated successfully, but these errors were encountered:
We're seeing some issues with QCG-PilotJob on very large machines, and in general given the heterogeneity of the HPC landscape it's probably good to have more than one way of starting and monitoring processes to maximise our chances of success.
We've been working with the authors of RADICAL-Pilot lately, and they are adding some features to it to make it more suitable for use as an instantiator for MUSCLE3. Let's add it as an optional second backend.
The text was updated successfully, but these errors were encountered: