Skip to content

Commit

Permalink
merge from devel
Browse files Browse the repository at this point in the history
  • Loading branch information
andre-merzky committed Dec 4, 2024
2 parents 85a1fe6 + 8659526 commit ae93b74
Showing 1 changed file with 76 additions and 64 deletions.
140 changes: 76 additions & 64 deletions docs/source/tutorials/submission.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,60 +13,9 @@
"\n",
"RADICAL-Pilot (RP) provides three ways to use [supported HPC platforms](../supported.rst) to execute workloads:\n",
"\n",
"- **Remote submission**: users can execute their RP application from their workstation, and then RP accesses the HPC platform via `ssh`.\n",
"- **Interactive submission**: users can submit an interactive/batch job on the HPC platform, and then RP from a compute node.\n",
"- **Login submission**: users can `ssh` into the login node of the HPC platform, and then launch their RP application from that shell.\n",
"\n",
"## Remote submission\n",
"\n",
"<div class=\"alert alert-warning\">\n",
"\n",
"__Warning:__ Remote submission **does not work with two factors authentication**. Target HPC platforms need to support passphrase-protected ssh keys as a login method without the use of a second authentication factor. Usually, the user needs to reach an agreement with the system administrators of the platform in order to allow `ssh` connections from a specific IP address. Putting such an agreement in place is from difficult to impossible, and requires a fixed IP.\n",
"\n",
"</div>\n",
"\n",
"<div class=\"alert alert-warning\">\n",
"\n",
"__Warning:__ Remote submissions **require a `ssh` connection to be alive for the entire duration of the application**. If the `ssh` connection fails while the application executes, the application will fail. This has the potential of leaving an orphan RP Agent running on the HPC platform, consuming allocation and failing to properly execute any new application task. Remote submissions should not be attempted on a laptop with a Wi-Fi connection; and the risk of interrupting the `ssh` connection increases with the time taken by the application to complete.\n",
"\n",
"</div>\n",
"\n",
"If you can manually `ssh` into the target HPC platform, RADICAL-Pilot can do the same. You will have to set up an ssh key and, for example, follow up this [guide](https://www.ssh.com/academy/ssh-keys#how-to-configure-key-based-authentication) if you need to become more familiar.\n",
"\n",
"**Note:** RADICAL-Pilot will not work without configuring the `ssh-agent`, and it will require entering the user's ssh key passphrase to access the HPC platform\n",
"\n",
"After setting up and configuring `ssh`, you will be able to instruct RP to run its client on your local workstation and its agent on one or more HPC platforms. With the remote submission mode, you:\n",
"\n",
"1. Create a pilot description object;\n",
"2. Specify and the RP resource ID of the supported HP platform;\n",
"3. Specify the access schema you want to use to access that platform."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2023-05-18T01:30:55.759075Z",
"iopub.status.busy": "2023-05-18T01:30:55.758587Z",
"iopub.status.idle": "2023-05-18T01:30:55.762690Z",
"shell.execute_reply": "2023-05-18T01:30:55.761860Z"
},
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"import radical.pilot as rp\n",
"\n",
"session = rp.Session()\n",
"\n",
"pd_init = {'resource' : 'tacc.frontera',\n",
" 'access_schema': 'ssh'\n",
" }\n",
"\n",
"pdesc = rp.PilotDescription(pd_init)"
"- **Remote submission**: users can execute their RP application from their workstation, and then RP accesses the HPC platform via `ssh`.\n",
"- **Login submission**: users can `ssh` into the login node of the HPC platform, and then launch their RP application from that shell."
]
},
{
Expand All @@ -84,8 +33,8 @@
"\n",
"User can perform an interactive submission of an RP application on a supported HPC platform in two ways: \n",
"\n",
"- Submitting an **interactive job** to the batch system to acquire a shell and then executing the RP application from that shell.\n",
"- Submitting a **batch script** to the batch system that, once scheduled, will execute the RP application.\n",
"- Submitting an **interactive job** from within a login node to the batch system to acquire a shell and then executing the RP application from that shell.\n",
"- Submitting a **batch script** from within a login node to the batch system that, once scheduled, will execute the RP application.\n",
"\n",
"<div class=\"alert alert-info\">\n",
"\n",
Expand All @@ -95,7 +44,7 @@
"\n",
"### Configuring an RP application for interactive submission\n",
"\n",
"You will need to set the `access_schema` in your pilot description to `interactive`. All the other parameters of your application remain the same and are independent of how you execute your RP application. For example, assume that your application requires 4096 cores, will terminate in 10 hours, and you want to execute it on TACC Frontera. To run it from an interactive job, you will have to use the following pilot description:"
"In this setup, RP will automatically interact with the HPC resource manager and set up your job once the resources are assigned to your job. For example, assume that your application requires 4096 cores, will terminate in 10 hours, and you want to execute it on TACC Frontera. To run it from an interactive job, you will have to use the following pilot description:"
]
},
{
Expand All @@ -114,17 +63,18 @@
},
"outputs": [],
"source": [
"import radical.pilot as rp\n",
"\n",
"session = rp.Session()\n",
"\n",
"pd_init = {'resource' : 'tacc.frontera',\n",
" 'access_schema': 'interactive',\n",
" 'runtime' : 6000,\n",
" 'runtime' : 600, # <-- assuming you asked for 600 minutes in your batch script or your interactive shell command\n",
" 'exit_on_error': True,\n",
" 'cores' : 4096,\n",
" 'cores' : 4096, # <-- assuming you asked for 4096 cores in your batch script or your interactive shell command\n",
" 'gpus' : 0\n",
" }\n",
"\n",
"pdesc = rp.PilotDescription(pd_init)\n",
"\n",
"session.close(cleanup=True)"
"pdesc = rp.PilotDescription(pd_init)"
]
},
{
Expand Down Expand Up @@ -175,8 +125,70 @@
"\n",
"```shell\n",
"sbatch myjobscript.sbatch\n",
"```\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Remote submission\n",
"\n",
"<div class=\"alert alert-warning\">\n",
"\n",
"__Warning:__ Remote submission **does not work with two factors authentication**. Target HPC platforms need to support passphrase-protected ssh keys as a login method without the use of a second authentication factor. Usually, the user needs to reach an agreement with the system administrators of the platform in order to allow `ssh` connections from a specific IP address. Putting such an agreement in place is from difficult to impossible, and requires a fixed IP.\n",
"\n",
"</div>\n",
"\n",
"<div class=\"alert alert-warning\">\n",
"\n",
"__Warning:__ Remote submissions **require a `ssh` connection to be alive for the entire duration of the application**. If the `ssh` connection fails while the application executes, the application will fail. This has the potential of leaving an orphan RP Agent running on the HPC platform, consuming allocation and failing to properly execute any new application task. Remote submissions should not be attempted on a laptop with a Wi-Fi connection; and the risk of interrupting the `ssh` connection increases with the time taken by the application to complete.\n",
"\n",
"</div>\n",
"\n",
"If you can manually `ssh` into the target HPC platform, RADICAL-Pilot can do the same. You will have to set up an SSH key and, for example, follow up this [guide](https://www.ssh.com/academy/ssh-keys#how-to-configure-key-based-authentication) if you need to become more familiar.\n",
"\n",
"**Note:** RADICAL-Pilot will not work without configuring the `ssh-agent`, and it will require entering the user's SSH key passphrase to access the HPC platform\n",
"\n",
"After setting up and configuring `ssh`, you can instruct RP to run its client on your local workstation and its agent on one or more HPC platforms. With the remote submission mode, you:\n",
"\n",
"1. Create a pilot description object;\n",
"2. Specify the RP resource ID of the supported HP platform;\n",
"3. Specify the access schema you want to use to access that platform."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2023-05-18T01:30:55.759075Z",
"iopub.status.busy": "2023-05-18T01:30:55.758587Z",
"iopub.status.idle": "2023-05-18T01:30:55.762690Z",
"shell.execute_reply": "2023-05-18T01:30:55.761860Z"
},
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"pd_init = {'resource' : 'tacc.frontera',\n",
" 'access_schema': 'ssh',\n",
" 'project' : 'myproject',\n",
" 'queue' : 'normal',\n",
" 'runtime' : 6000,\n",
" }\n",
"\n",
"pdesc = rp.PilotDescription(pd_init)\n",
"\n",
"session.close(cleanup=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Login submission\n",
"\n",
"<div class=\"alert alert-warning\">\n",
Expand Down Expand Up @@ -214,7 +226,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
"version": "3.12.3"
}
},
"nbformat": 4,
Expand Down

0 comments on commit ae93b74

Please sign in to comment.