From 2129af295033fe7c853739efdd9a669abd237124 Mon Sep 17 00:00:00 2001 From: Aymen Alsaadi <27039262+AymenFJA@users.noreply.github.com> Date: Wed, 13 Nov 2024 07:26:43 -0500 Subject: [PATCH 1/5] fix docs --- docs/source/tutorials/submission.ipynb | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/docs/source/tutorials/submission.ipynb b/docs/source/tutorials/submission.ipynb index 4772b8706..0f5b4570e 100644 --- a/docs/source/tutorials/submission.ipynb +++ b/docs/source/tutorials/submission.ipynb @@ -64,6 +64,9 @@ "\n", "pd_init = {'resource' : 'tacc.frontera',\n", " 'access_schema': 'ssh'\n", + " 'project' : 'myproject',\n", + " 'queue' : 'normal',\n", + " 'runtime' : 6000,\n", " }\n", "\n", "pdesc = rp.PilotDescription(pd_init)" @@ -95,7 +98,7 @@ "\n", "### Configuring an RP application for interactive submission\n", "\n", - "You will need to set the `access_schema` in your pilot description to `interactive`. All the other parameters of your application remain the same and are independent of how you execute your RP application. For example, assume that your application requires 4096 cores, will terminate in 10 hours, and you want to execute it on TACC Frontera. To run it from an interactive job, you will have to use the following pilot description:" + "In this setup, specifying the `access_schema` in your pilot description is optional. If left unspecified, it will default to `interactive`, and RP will automatically configure the appropriate access method for you. All the other parameters of your application remain the same and are independent of how you execute your RP application. For example, assume that your application requires 4096 cores, will terminate in 10 hours, and you want to execute it on TACC Frontera. To run it from an interactive job, you will have to use the following pilot description:" ] }, { @@ -115,7 +118,7 @@ "outputs": [], "source": [ "pd_init = {'resource' : 'tacc.frontera',\n", - " 'access_schema': 'interactive',\n", + " 'access_schema': 'interactive', # <-- optional\n", " 'runtime' : 6000,\n", " 'exit_on_error': True,\n", " 'cores' : 4096,\n", From 2572c0c28e0c11d22d2fb59ce327c8f26b9cb116 Mon Sep 17 00:00:00 2001 From: Aymen Alsaadi <27039262+AymenFJA@users.noreply.github.com> Date: Fri, 15 Nov 2024 08:29:28 -0500 Subject: [PATCH 2/5] fix a missing comma --- docs/source/tutorials/submission.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/tutorials/submission.ipynb b/docs/source/tutorials/submission.ipynb index 0f5b4570e..daf9b952b 100644 --- a/docs/source/tutorials/submission.ipynb +++ b/docs/source/tutorials/submission.ipynb @@ -63,7 +63,7 @@ "session = rp.Session()\n", "\n", "pd_init = {'resource' : 'tacc.frontera',\n", - " 'access_schema': 'ssh'\n", + " 'access_schema': 'ssh',\n", " 'project' : 'myproject',\n", " 'queue' : 'normal',\n", " 'runtime' : 6000,\n", From 12da185a4cda2227adafc014f64681362dd3d89f Mon Sep 17 00:00:00 2001 From: AymenFJA Date: Wed, 20 Nov 2024 12:32:43 +0000 Subject: [PATCH 3/5] address comments --- docs/source/tutorials/submission.ipynb | 137 +++++++++++++------------ 1 file changed, 73 insertions(+), 64 deletions(-) diff --git a/docs/source/tutorials/submission.ipynb b/docs/source/tutorials/submission.ipynb index daf9b952b..402c2d1e0 100644 --- a/docs/source/tutorials/submission.ipynb +++ b/docs/source/tutorials/submission.ipynb @@ -13,63 +13,9 @@ "\n", "RADICAL-Pilot (RP) provides three ways to use [supported HPC platforms](../supported.rst) to execute workloads:\n", "\n", - "- **Remote submission**: users can execute their RP application from their workstation, and then RP accesses the HPC platform via `ssh`.\n", "- **Interactive submission**: users can submit an interactive/batch job on the HPC platform, and then RP from a compute node.\n", - "- **Login submission**: users can `ssh` into the login node of the HPC platform, and then launch their RP application from that shell.\n", - "\n", - "## Remote submission\n", - "\n", - "
\n", - "\n", - "__Warning:__ Remote submission **does not work with two factors authentication**. Target HPC platforms need to support passphrase-protected ssh keys as a login method without the use of a second authentication factor. Usually, the user needs to reach an agreement with the system administrators of the platform in order to allow `ssh` connections from a specific IP address. Putting such an agreement in place is from difficult to impossible, and requires a fixed IP.\n", - "\n", - "
\n", - "\n", - "
\n", - "\n", - "__Warning:__ Remote submissions **require a `ssh` connection to be alive for the entire duration of the application**. If the `ssh` connection fails while the application executes, the application will fail. This has the potential of leaving an orphan RP Agent running on the HPC platform, consuming allocation and failing to properly execute any new application task. Remote submissions should not be attempted on a laptop with a Wi-Fi connection; and the risk of interrupting the `ssh` connection increases with the time taken by the application to complete.\n", - "\n", - "
\n", - "\n", - "If you can manually `ssh` into the target HPC platform, RADICAL-Pilot can do the same. You will have to set up an ssh key and, for example, follow up this [guide](https://www.ssh.com/academy/ssh-keys#how-to-configure-key-based-authentication) if you need to become more familiar.\n", - "\n", - "**Note:** RADICAL-Pilot will not work without configuring the `ssh-agent`, and it will require entering the user's ssh key passphrase to access the HPC platform\n", - "\n", - "After setting up and configuring `ssh`, you will be able to instruct RP to run its client on your local workstation and its agent on one or more HPC platforms. With the remote submission mode, you:\n", - "\n", - "1. Create a pilot description object;\n", - "2. Specify and the RP resource ID of the supported HP platform;\n", - "3. Specify the access schema you want to use to access that platform." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "execution": { - "iopub.execute_input": "2023-05-18T01:30:55.759075Z", - "iopub.status.busy": "2023-05-18T01:30:55.758587Z", - "iopub.status.idle": "2023-05-18T01:30:55.762690Z", - "shell.execute_reply": "2023-05-18T01:30:55.761860Z" - }, - "vscode": { - "languageId": "plaintext" - } - }, - "outputs": [], - "source": [ - "import radical.pilot as rp\n", - "\n", - "session = rp.Session()\n", - "\n", - "pd_init = {'resource' : 'tacc.frontera',\n", - " 'access_schema': 'ssh',\n", - " 'project' : 'myproject',\n", - " 'queue' : 'normal',\n", - " 'runtime' : 6000,\n", - " }\n", - "\n", - "pdesc = rp.PilotDescription(pd_init)" + "- **Remote submission**: users can execute their RP application from their workstation, and then RP accesses the HPC platform via `ssh`.\n", + "- **Login submission**: users can `ssh` into the login node of the HPC platform, and then launch their RP application from that shell." ] }, { @@ -87,8 +33,8 @@ "\n", "User can perform an interactive submission of an RP application on a supported HPC platform in two ways: \n", "\n", - "- Submitting an **interactive job** to the batch system to acquire a shell and then executing the RP application from that shell.\n", - "- Submitting a **batch script** to the batch system that, once scheduled, will execute the RP application.\n", + "- Submitting an **interactive job** from within a login node to the batch system to acquire a shell and then executing the RP application from that shell.\n", + "- Submitting a **batch script** from within a login node to the batch system that, once scheduled, will execute the RP application.\n", "\n", "
\n", "\n", @@ -98,7 +44,7 @@ "\n", "### Configuring an RP application for interactive submission\n", "\n", - "In this setup, specifying the `access_schema` in your pilot description is optional. If left unspecified, it will default to `interactive`, and RP will automatically configure the appropriate access method for you. All the other parameters of your application remain the same and are independent of how you execute your RP application. For example, assume that your application requires 4096 cores, will terminate in 10 hours, and you want to execute it on TACC Frontera. To run it from an interactive job, you will have to use the following pilot description:" + "In this setup, RP will automatically interact with the HPC resource manager and set up your job once the resources are assigned to your job. For example, assume that your application requires 4096 cores, will terminate in 10 hours, and you want to execute it on TACC Frontera. To run it from an interactive job, you will have to use the following pilot description:" ] }, { @@ -118,10 +64,9 @@ "outputs": [], "source": [ "pd_init = {'resource' : 'tacc.frontera',\n", - " 'access_schema': 'interactive', # <-- optional\n", - " 'runtime' : 6000,\n", + " 'runtime' : 600, # <-- assuming you asked for 600 minutes in your batch script or your interactive shell command\n", " 'exit_on_error': True,\n", - " 'cores' : 4096,\n", + " 'cores' : 4096, # <-- assuming you asked for 4096 cores in your batch script or your interactive shell command\n", " 'gpus' : 0\n", " }\n", "\n", @@ -178,8 +123,72 @@ "\n", "```shell\n", "sbatch myjobscript.sbatch\n", - "```\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Remote submission\n", + "\n", + "
\n", + "\n", + "__Warning:__ Remote submission **does not work with two factors authentication**. Target HPC platforms need to support passphrase-protected ssh keys as a login method without the use of a second authentication factor. Usually, the user needs to reach an agreement with the system administrators of the platform in order to allow `ssh` connections from a specific IP address. Putting such an agreement in place is from difficult to impossible, and requires a fixed IP.\n", "\n", + "
\n", + "\n", + "
\n", + "\n", + "__Warning:__ Remote submissions **require a `ssh` connection to be alive for the entire duration of the application**. If the `ssh` connection fails while the application executes, the application will fail. This has the potential of leaving an orphan RP Agent running on the HPC platform, consuming allocation and failing to properly execute any new application task. Remote submissions should not be attempted on a laptop with a Wi-Fi connection; and the risk of interrupting the `ssh` connection increases with the time taken by the application to complete.\n", + "\n", + "
\n", + "\n", + "If you can manually `ssh` into the target HPC platform, RADICAL-Pilot can do the same. You will have to set up an SSH key and, for example, follow up this [guide](https://www.ssh.com/academy/ssh-keys#how-to-configure-key-based-authentication) if you need to become more familiar.\n", + "\n", + "**Note:** RADICAL-Pilot will not work without configuring the `ssh-agent`, and it will require entering the user's SSH key passphrase to access the HPC platform\n", + "\n", + "After setting up and configuring `ssh`, you can instruct RP to run its client on your local workstation and its agent on one or more HPC platforms. With the remote submission mode, you:\n", + "\n", + "1. Create a pilot description object;\n", + "2. Specify the RP resource ID of the supported HP platform;\n", + "3. Specify the access schema you want to use to access that platform." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "execution": { + "iopub.execute_input": "2023-05-18T01:30:55.759075Z", + "iopub.status.busy": "2023-05-18T01:30:55.758587Z", + "iopub.status.idle": "2023-05-18T01:30:55.762690Z", + "shell.execute_reply": "2023-05-18T01:30:55.761860Z" + }, + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "import radical.pilot as rp\n", + "\n", + "session = rp.Session()\n", + "\n", + "pd_init = {'resource' : 'tacc.frontera',\n", + " 'access_schema': 'ssh',\n", + " 'project' : 'myproject',\n", + " 'queue' : 'normal',\n", + " 'runtime' : 6000,\n", + " }\n", + "\n", + "pdesc = rp.PilotDescription(pd_init)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "## Login submission\n", "\n", "
\n", @@ -217,7 +226,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.13" + "version": "3.12.3" } }, "nbformat": 4, From b49ea9a746e5bd48e2c150ab5adee02e3343b832 Mon Sep 17 00:00:00 2001 From: Aymen Alsaadi <27039262+AymenFJA@users.noreply.github.com> Date: Tue, 26 Nov 2024 16:22:15 -0500 Subject: [PATCH 4/5] fix import --- docs/source/tutorials/submission.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/tutorials/submission.ipynb b/docs/source/tutorials/submission.ipynb index 402c2d1e0..85e8b8047 100644 --- a/docs/source/tutorials/submission.ipynb +++ b/docs/source/tutorials/submission.ipynb @@ -63,6 +63,8 @@ }, "outputs": [], "source": [ + "import radical.pilot as rp\n", + "\n", "pd_init = {'resource' : 'tacc.frontera',\n", " 'runtime' : 600, # <-- assuming you asked for 600 minutes in your batch script or your interactive shell command\n", " 'exit_on_error': True,\n", @@ -171,8 +173,6 @@ }, "outputs": [], "source": [ - "import radical.pilot as rp\n", - "\n", "session = rp.Session()\n", "\n", "pd_init = {'resource' : 'tacc.frontera',\n", From 825ff94fad53ba2f2866d22e10b9997a46a925bb Mon Sep 17 00:00:00 2001 From: Aymen Alsaadi <27039262+AymenFJA@users.noreply.github.com> Date: Tue, 26 Nov 2024 16:55:55 -0500 Subject: [PATCH 5/5] fix import --- docs/source/tutorials/submission.ipynb | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/source/tutorials/submission.ipynb b/docs/source/tutorials/submission.ipynb index 85e8b8047..c18a1e773 100644 --- a/docs/source/tutorials/submission.ipynb +++ b/docs/source/tutorials/submission.ipynb @@ -65,6 +65,8 @@ "source": [ "import radical.pilot as rp\n", "\n", + "session = rp.Session()\n", + "\n", "pd_init = {'resource' : 'tacc.frontera',\n", " 'runtime' : 600, # <-- assuming you asked for 600 minutes in your batch script or your interactive shell command\n", " 'exit_on_error': True,\n", @@ -72,9 +74,7 @@ " 'gpus' : 0\n", " }\n", "\n", - "pdesc = rp.PilotDescription(pd_init)\n", - "\n", - "session.close(cleanup=True)" + "pdesc = rp.PilotDescription(pd_init)" ] }, { @@ -173,8 +173,6 @@ }, "outputs": [], "source": [ - "session = rp.Session()\n", - "\n", "pd_init = {'resource' : 'tacc.frontera',\n", " 'access_schema': 'ssh',\n", " 'project' : 'myproject',\n", @@ -182,7 +180,9 @@ " 'runtime' : 6000,\n", " }\n", "\n", - "pdesc = rp.PilotDescription(pd_init)" + "pdesc = rp.PilotDescription(pd_init)\n", + "\n", + "session.close(cleanup=True)" ] }, {