-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DPS Tutorial v2 (for workspace-release v4.0.0) #408
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
93b0447
added an intro for context
rtapella bc56fc5
suggest starting with a fresh Terminal
rtapella ebfdf06
image updates
rtapella 008674f
initial link of DPS Runner notebook
rtapella 15ba864
last of fg user feedback
rtapella 5cef361
connecting the yml to the register UI
rtapella dab9564
updates for v400 vanilla to python conda env
rtapella 12324be
add code style to a conda env
rtapella 17e4ad6
starting to clarify the template DPS runner notebook
rtapella 0628e87
fix API call in template
rtapella 06b32d0
Fixed Variable typo
rtapella 7145a16
multiple small changes from reviewers
rtapella File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
353 changes: 353 additions & 0 deletions
353
docs/source/technical_tutorials/dps_tutorial/DPS_runner_template.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,353 @@ | ||
{ | ||
rtapella marked this conversation as resolved.
Show resolved
Hide resolved
rtapella marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this might be confusing to some as it's mixing shell/bash commands into Python cells. Could use the python equivalent. Reply via ReviewNB
rtapella marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "330e3ace", | ||
"metadata": {}, | ||
"source": [ | ||
"# Prepare and launch a DPS batch of jobs for a particular algorithm\n", | ||
"\n", | ||
"**Goal**\n", | ||
"Provide a template for DPS job submission which will be changed/adapted according to specific algorithms being run in DPS.\n", | ||
"\n", | ||
"**Motivation** \n", | ||
"It's easier to learn how to run many jobs of your script (where for each job there is some input that changes) if you can first see an example.\n", | ||
"\n", | ||
"Paul Montesano, PhD \n", | ||
"[email protected] \n", | ||
"June 2024" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 126, | ||
"id": "ea7bcf9f", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from maap.maap import MAAP\n", | ||
"maap = MAAP()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 127, | ||
"id": "be655aaf-644c-4041-8d04-e1237a50a7f4", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"'api.maap-project.org'" | ||
] | ||
}, | ||
"execution_count": 127, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"maap._MAAP_HOST" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "5c541eee", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"import pandas as pd\n", | ||
"import glob\n", | ||
"import datetime\n", | ||
"import sys" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "3a058f23-c0a1-4445-9656-70eb7489441b", | ||
"metadata": {}, | ||
"source": [ | ||
"### Use MAAP Registration call in notebook chunk to register DPS algorithm\n", | ||
" - You need to register the DPS algorithm before first before you loop over jobs that will use it.\n", | ||
" - If you register your algorithm using the Register Algorithm UI in Jupyter, a configuration file (in yml format) will be placed in your workspace home folder, which can then be used as a template for reuse" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "7810c9e6-5dc8-4969-b1f4-beb3d06e9d96", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"maap.register_algorithm_from_yaml_file(\"/projects/.../.../<my_algorithms_yaml_file>.yml\").text" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "836409b4", | ||
"metadata": {}, | ||
"source": [ | ||
"### Build a dictionary of the argument names and values needed to run the algorithm in the way you want\n", | ||
"\n", | ||
"This can be called a `parameters dictionary` \n", | ||
"\n", | ||
" - These will be arguments that the `.sh` wrapper (which calls your `.py` or `.R` code) is hard-coded to accept. \n", | ||
" - The `.yml` file that you use to Register your algorithm is what connects this `parameters dictionary` to your `.sh` wrapper. \n", | ||
" - This combo of the `parameters dictionary`, the `.yml`, and the `.sh` provides a specific (and repeatable) way of running your `.py` or `.R` code." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c0fea3b7", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Note: make sure the `in_params_dict` coincides with the args of your underlying Python/R code" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"id": "65681b96", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"in_params_dict = {\n", | ||
" 'arg name_1': 'some_value',\n", | ||
" 'arg_name_2': 'another_value',\n", | ||
" 'in_tile_num': 1\n", | ||
" }" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "46e6ffc9-cc7d-4b56-a310-811774054d7e", | ||
"metadata": {}, | ||
"source": [ | ||
"### Set up a list of items you want to run across - this is an example of some algorithm input that will vary according to job\n", | ||
"\n", | ||
"In this example, we are using geographic `tiles` to break up our processing. These tiles are defined by vector polygons and have ids that our `.sh`, `.py`, and `.yml` files are set up to take in as arguments. We use these ids as the basis for a loop that will sequentially submit our jobs to DPS. \n", | ||
"\n", | ||
"There are many ways one could decide to split up their DPS jobs - so this use of tiles here is just for the purposes of this example." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 15, | ||
"id": "4fd13e32-77c8-4641-82e9-85c0ad0e8cde", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Just an example of a list of some input parameter to your script that needs to vary for each job, thus creating multiple jobs\n", | ||
"DPS_INPUT_TILE_NUM_LIST = [1,3,5,7,13,17,19]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "d72590cf-d9c4-438c-9a2d-684ab5d08549", | ||
"metadata": {}, | ||
"source": [ | ||
"### Set up the general submission variables that will be applied to all runs of this DPS batch\n", | ||
"\n", | ||
"These will also determine the look of path of the DPS output (`/projects/my-private-bucket/dps_output`): \n", | ||
"`/projects/my-private-bucket/dps_output/<ALGO_ID>/<ALGO_VERSION>/<IDENTIFIER>`" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"id": "e6c61e32-3550-43ff-aa3a-cbbfa97efb2d", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# MAAP algorithm version name\n", | ||
"IDENTIFIER='BIOMASS_2020'\n", | ||
"ALGO_VERSION = 'my_biomass_algorithm_v2024_1'\n", | ||
"ALGO_ID = \"run_my_biomass_algorithm\"\n", | ||
"USER = 'montesano'\n", | ||
"WORKER_TYPE = 'maap-dps-worker-8gb'" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 8, | ||
"id": "01c52cde-1d06-4007-a637-34988938b099", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"'BIOMASS_2020'" | ||
] | ||
}, | ||
"execution_count": 8, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"RUN_NAME = IDENTIFIER\n", | ||
"RUN_NAME" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 13, | ||
"id": "6490e474-3f44-4634-b198-6c03eaccc171", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"[1, 3]" | ||
] | ||
}, | ||
"execution_count": 13, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"DPS_INPUT_TILE_NUM_LIST[0:2]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "80232c11-dd65-43b4-9c50-40c9f2dc87a4", | ||
"metadata": {}, | ||
"source": [ | ||
"### Set up a dir to hold the metadata output table from the DPS submission" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "7f45994e-10f6-405d-aeb9-f2263b4e7662", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"DPS_SUBMISSION_RESULTS_DIR = '/projects/my-public-bucket/dps_submission_results'\n", | ||
"!mkdir -p $DPS_SUBMISSION_RESULTS_DIR" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "86193dd5", | ||
"metadata": {}, | ||
"source": [ | ||
"## Run a DPS job across the list\n", | ||
"\n", | ||
"The submission is done as a loop. \n", | ||
"\n", | ||
"Since submission is fast, this doesn't need to be parallelized. The jobs will start soon after submission and will be processed in parallel depending on administrator settings." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "4abfe38b", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"%%time\n", | ||
"\n", | ||
"import json\n", | ||
"\n", | ||
"submit_results_df_list = []\n", | ||
"len_input_list = len(DPS_INPUT_TILE_NUM_LIST)\n", | ||
"print(f\"# of input tiles for DPS: {len_input_list}\")\n", | ||
"\n", | ||
"for i, INPUT_TILE_NUM in enumerate(DPS_INPUT_TILE_NUM_LIST):\n", | ||
" \n", | ||
" # Just a way to keep track of the job number associated with this submission's loop\n", | ||
" DPS_num = i+1\n", | ||
" \n", | ||
" # Update the in_params_dict with the current INPUT_TILE_NUM from this loop\n", | ||
" in_params_dict['in_tile_num'] = INPUT_TILE_NUM\n", | ||
" \n", | ||
" submit_result = maap.submitJob(\n", | ||
" identifier=IDENTIFIER,\n", | ||
" algo_id=ALGO_ID,\n", | ||
" version=ALGO_VERSION,\n", | ||
" username=USER, # username needs to be the same as whoever created the workspace\n", | ||
" queue=WORKER_TYPE,\n", | ||
" **in_params_dict\n", | ||
" )\n", | ||
" \n", | ||
" # Build a dataframe of submission details - this holds metadata about your DPS job\n", | ||
" submit_result_df = pd.DataFrame( \n", | ||
" {\n", | ||
" 'dps_num':[DPS_num],\n", | ||
" 'tile_num':[INPUT_TILE_NUM],\n", | ||
" 'submit_time':[datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%s')],\n", | ||
" 'dbs_job_hour': [datetime.datetime.now().hour],\n", | ||
" 'algo_id': [ALGO_ID],\n", | ||
" 'user': [USER],\n", | ||
" 'worker_type': [WORKER_TYPE],\n", | ||
" 'job_id': [submit_result.id],\n", | ||
" 'submit_status': [submit_result.status],\n", | ||
" \n", | ||
" } \n", | ||
" )\n", | ||
" \n", | ||
" # Append to a list of data frames of DPS submission results\n", | ||
" submit_results_df_list.append(submit_result_df)\n", | ||
" \n", | ||
" if DPS_num in [1, 5, 10, 50, 100, 250, 500, 750, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 7000, 9000, 11000, 13000, 15000, 17000, 19000, 21000, 24000, len_input_list]:\n", | ||
" print(f\"DPS run #: {DPS_num}\\t| tile num: {INPUT_TILE_NUM}\\t| submit status: {submit_result.status}\\t| job id: {submit_result.id}\") \n", | ||
" \n", | ||
"# Build a final submission results data frame and save\n", | ||
"submit_results_df = pd.concat(submit_results_df_list)\n", | ||
"submit_results_df['run_name'] = RUN_NAME\n", | ||
"nowtime = pd.Timestamp.now().strftime('%Y%m%d%H%M')\n", | ||
"print(f\"Current time:\\t{nowtime}\")\n", | ||
"\n", | ||
"# This creates a CSV of the metadata associated with the DPS jobs you have just submitted\n", | ||
"submit_results_df.to_csv(f'{DPS_SUBMISSION_RESULTS_DIR}/DPS_{ALGO_ID}_{RUN_NAME}_submission_results_{len_input_list}_{nowtime}.csv')\n", | ||
"submit_results_df.info()" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.13" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Binary file added
BIN
+48.9 KB
docs/source/technical_tutorials/dps_tutorial/_static/python_env_default.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+165 KB
docs/source/technical_tutorials/dps_tutorial/_static/tutorial_overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+426 KB
docs/source/technical_tutorials/dps_tutorial/_static/tutorial_register_api_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified
BIN
+9.16 KB
(100%)
docs/source/technical_tutorials/dps_tutorial/_static/tutorial_view_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 25 additions & 0 deletions
25
docs/source/technical_tutorials/dps_tutorial/algorithm_config_template.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
algorithm_description: This is a free-form description of your algorithm | ||
algorithm_name: dps-tutorial-name | ||
algorithm_version: main | ||
build_command: dps_tutorial/gdal_wrapper/build-env.sh | ||
disk_space: 1GB | ||
docker_container_url: mas.maap-project.org/root/maap-workspaces/base_images/vanilla:v3.1.5 | ||
inputs: | ||
config: [] | ||
file: | ||
- default: '' | ||
description: The name of the input file | ||
name: input_file | ||
required: true | ||
positional: | ||
- default: '' | ||
description: output file name | ||
name: output_file | ||
required: true | ||
- default: '30' | ||
description: the percent reduction of your output file vs the input file | ||
name: percent_reduction | ||
required: true | ||
queue: maap-dps-worker-8gb | ||
repository_url: https://github.com/MAAP-Project/dps_tutorial.git | ||
run_command: dps_tutorial/gdal_wrapper/run_gdal.sh |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably belongs in a different place - refer to Registering an Algorithm section. This section should just say: Make sure you are using an already registered algorithm or register one (link).
Reply via ReviewNB