-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #408 from MAAP-Project/dps_tut_v2
DPS Tutorial v2 (for workspace-release v4.0.0)
- Loading branch information
Showing
7 changed files
with
472 additions
and
56 deletions.
There are no files selected for viewing
353 changes: 353 additions & 0 deletions
353
docs/source/technical_tutorials/dps_tutorial/DPS_runner_template.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,353 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "330e3ace", | ||
"metadata": {}, | ||
"source": [ | ||
"# Prepare and launch a DPS batch of jobs for a particular algorithm\n", | ||
"\n", | ||
"**Goal**\n", | ||
"Provide a template for DPS job submission which will be changed/adapted according to specific algorithms being run in DPS.\n", | ||
"\n", | ||
"**Motivation** \n", | ||
"It's easier to learn how to run many jobs of your script (where for each job there is some input that changes) if you can first see an example.\n", | ||
"\n", | ||
"Paul Montesano, PhD \n", | ||
"[email protected] \n", | ||
"June 2024" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 126, | ||
"id": "ea7bcf9f", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from maap.maap import MAAP\n", | ||
"maap = MAAP()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 127, | ||
"id": "be655aaf-644c-4041-8d04-e1237a50a7f4", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"'api.maap-project.org'" | ||
] | ||
}, | ||
"execution_count": 127, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"maap._MAAP_HOST" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "5c541eee", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"import pandas as pd\n", | ||
"import glob\n", | ||
"import datetime\n", | ||
"import sys" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "3a058f23-c0a1-4445-9656-70eb7489441b", | ||
"metadata": {}, | ||
"source": [ | ||
"### Use MAAP Registration call in notebook chunk to register DPS algorithm\n", | ||
" - You need to register the DPS algorithm before first before you loop over jobs that will use it.\n", | ||
" - If you register your algorithm using the Register Algorithm UI in Jupyter, a configuration file (in yml format) will be placed in your workspace home folder, which can then be used as a template for reuse" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "7810c9e6-5dc8-4969-b1f4-beb3d06e9d96", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"maap.register_algorithm_from_yaml_file(\"/projects/.../.../<my_algorithms_yaml_file>.yml\").text" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "836409b4", | ||
"metadata": {}, | ||
"source": [ | ||
"### Build a dictionary of the argument names and values needed to run the algorithm in the way you want\n", | ||
"\n", | ||
"This can be called a `parameters dictionary` \n", | ||
"\n", | ||
" - These will be arguments that the `.sh` wrapper (which calls your `.py` or `.R` code) is hard-coded to accept. \n", | ||
" - The `.yml` file that you use to Register your algorithm is what connects this `parameters dictionary` to your `.sh` wrapper. \n", | ||
" - This combo of the `parameters dictionary`, the `.yml`, and the `.sh` provides a specific (and repeatable) way of running your `.py` or `.R` code." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c0fea3b7", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Note: make sure the `in_params_dict` coincides with the args of your underlying Python/R code" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"id": "65681b96", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"in_params_dict = {\n", | ||
" 'arg name_1': 'some_value',\n", | ||
" 'arg_name_2': 'another_value',\n", | ||
" 'in_tile_num': 1\n", | ||
" }" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "46e6ffc9-cc7d-4b56-a310-811774054d7e", | ||
"metadata": {}, | ||
"source": [ | ||
"### Set up a list of items you want to run across - this is an example of some algorithm input that will vary according to job\n", | ||
"\n", | ||
"In this example, we are using geographic `tiles` to break up our processing. These tiles are defined by vector polygons and have ids that our `.sh`, `.py`, and `.yml` files are set up to take in as arguments. We use these ids as the basis for a loop that will sequentially submit our jobs to DPS. \n", | ||
"\n", | ||
"There are many ways one could decide to split up their DPS jobs - so this use of tiles here is just for the purposes of this example." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 15, | ||
"id": "4fd13e32-77c8-4641-82e9-85c0ad0e8cde", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Just an example of a list of some input parameter to your script that needs to vary for each job, thus creating multiple jobs\n", | ||
"DPS_INPUT_TILE_NUM_LIST = [1,3,5,7,13,17,19]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "d72590cf-d9c4-438c-9a2d-684ab5d08549", | ||
"metadata": {}, | ||
"source": [ | ||
"### Set up the general submission variables that will be applied to all runs of this DPS batch\n", | ||
"\n", | ||
"These will also determine the look of path of the DPS output (`/projects/my-private-bucket/dps_output`): \n", | ||
"`/projects/my-private-bucket/dps_output/<ALGO_ID>/<ALGO_VERSION>/<IDENTIFIER>`" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"id": "e6c61e32-3550-43ff-aa3a-cbbfa97efb2d", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# MAAP algorithm version name\n", | ||
"IDENTIFIER='BIOMASS_2020'\n", | ||
"ALGO_VERSION = 'my_biomass_algorithm_v2024_1'\n", | ||
"ALGO_ID = \"run_my_biomass_algorithm\"\n", | ||
"USER = 'montesano'\n", | ||
"WORKER_TYPE = 'maap-dps-worker-8gb'" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 8, | ||
"id": "01c52cde-1d06-4007-a637-34988938b099", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"'BIOMASS_2020'" | ||
] | ||
}, | ||
"execution_count": 8, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"RUN_NAME = IDENTIFIER\n", | ||
"RUN_NAME" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 13, | ||
"id": "6490e474-3f44-4634-b198-6c03eaccc171", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"[1, 3]" | ||
] | ||
}, | ||
"execution_count": 13, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"DPS_INPUT_TILE_NUM_LIST[0:2]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "80232c11-dd65-43b4-9c50-40c9f2dc87a4", | ||
"metadata": {}, | ||
"source": [ | ||
"### Set up a dir to hold the metadata output table from the DPS submission" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "7f45994e-10f6-405d-aeb9-f2263b4e7662", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"DPS_SUBMISSION_RESULTS_DIR = '/projects/my-public-bucket/dps_submission_results'\n", | ||
"!mkdir -p $DPS_SUBMISSION_RESULTS_DIR" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "86193dd5", | ||
"metadata": {}, | ||
"source": [ | ||
"## Run a DPS job across the list\n", | ||
"\n", | ||
"The submission is done as a loop. \n", | ||
"\n", | ||
"Since submission is fast, this doesn't need to be parallelized. The jobs will start soon after submission and will be processed in parallel depending on administrator settings." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "4abfe38b", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"%%time\n", | ||
"\n", | ||
"import json\n", | ||
"\n", | ||
"submit_results_df_list = []\n", | ||
"len_input_list = len(DPS_INPUT_TILE_NUM_LIST)\n", | ||
"print(f\"# of input tiles for DPS: {len_input_list}\")\n", | ||
"\n", | ||
"for i, INPUT_TILE_NUM in enumerate(DPS_INPUT_TILE_NUM_LIST):\n", | ||
" \n", | ||
" # Just a way to keep track of the job number associated with this submission's loop\n", | ||
" DPS_num = i+1\n", | ||
" \n", | ||
" # Update the in_params_dict with the current INPUT_TILE_NUM from this loop\n", | ||
" in_params_dict['in_tile_num'] = INPUT_TILE_NUM\n", | ||
" \n", | ||
" submit_result = maap.submitJob(\n", | ||
" identifier=IDENTIFIER,\n", | ||
" algo_id=ALGO_ID,\n", | ||
" version=ALGO_VERSION,\n", | ||
" username=USER, # username needs to be the same as whoever created the workspace\n", | ||
" queue=WORKER_TYPE,\n", | ||
" **in_params_dict\n", | ||
" )\n", | ||
" \n", | ||
" # Build a dataframe of submission details - this holds metadata about your DPS job\n", | ||
" submit_result_df = pd.DataFrame( \n", | ||
" {\n", | ||
" 'dps_num':[DPS_num],\n", | ||
" 'tile_num':[INPUT_TILE_NUM],\n", | ||
" 'submit_time':[datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%s')],\n", | ||
" 'dbs_job_hour': [datetime.datetime.now().hour],\n", | ||
" 'algo_id': [ALGO_ID],\n", | ||
" 'user': [USER],\n", | ||
" 'worker_type': [WORKER_TYPE],\n", | ||
" 'job_id': [submit_result.id],\n", | ||
" 'submit_status': [submit_result.status],\n", | ||
" \n", | ||
" } \n", | ||
" )\n", | ||
" \n", | ||
" # Append to a list of data frames of DPS submission results\n", | ||
" submit_results_df_list.append(submit_result_df)\n", | ||
" \n", | ||
" if DPS_num in [1, 5, 10, 50, 100, 250, 500, 750, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 7000, 9000, 11000, 13000, 15000, 17000, 19000, 21000, 24000, len_input_list]:\n", | ||
" print(f\"DPS run #: {DPS_num}\\t| tile num: {INPUT_TILE_NUM}\\t| submit status: {submit_result.status}\\t| job id: {submit_result.id}\") \n", | ||
" \n", | ||
"# Build a final submission results data frame and save\n", | ||
"submit_results_df = pd.concat(submit_results_df_list)\n", | ||
"submit_results_df['run_name'] = RUN_NAME\n", | ||
"nowtime = pd.Timestamp.now().strftime('%Y%m%d%H%M')\n", | ||
"print(f\"Current time:\\t{nowtime}\")\n", | ||
"\n", | ||
"# This creates a CSV of the metadata associated with the DPS jobs you have just submitted\n", | ||
"submit_results_df.to_csv(f'{DPS_SUBMISSION_RESULTS_DIR}/DPS_{ALGO_ID}_{RUN_NAME}_submission_results_{len_input_list}_{nowtime}.csv')\n", | ||
"submit_results_df.info()" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.13" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Binary file added
BIN
+48.9 KB
docs/source/technical_tutorials/dps_tutorial/_static/python_env_default.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+165 KB
docs/source/technical_tutorials/dps_tutorial/_static/tutorial_overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+426 KB
docs/source/technical_tutorials/dps_tutorial/_static/tutorial_register_api_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified
BIN
+9.16 KB
(100%)
docs/source/technical_tutorials/dps_tutorial/_static/tutorial_view_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 25 additions & 0 deletions
25
docs/source/technical_tutorials/dps_tutorial/algorithm_config_template.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
algorithm_description: This is a free-form description of your algorithm | ||
algorithm_name: dps-tutorial-name | ||
algorithm_version: main | ||
build_command: dps_tutorial/gdal_wrapper/build-env.sh | ||
disk_space: 1GB | ||
docker_container_url: mas.maap-project.org/root/maap-workspaces/base_images/vanilla:v3.1.5 | ||
inputs: | ||
config: [] | ||
file: | ||
- default: '' | ||
description: The name of the input file | ||
name: input_file | ||
required: true | ||
positional: | ||
- default: '' | ||
description: output file name | ||
name: output_file | ||
required: true | ||
- default: '30' | ||
description: the percent reduction of your output file vs the input file | ||
name: percent_reduction | ||
required: true | ||
queue: maap-dps-worker-8gb | ||
repository_url: https://github.com/MAAP-Project/dps_tutorial.git | ||
run_command: dps_tutorial/gdal_wrapper/run_gdal.sh |
Oops, something went wrong.