Skip to content

Commit

Permalink
document
Browse files Browse the repository at this point in the history
  • Loading branch information
jakevc committed Jul 19, 2024
1 parent 22ef64e commit 7019d75
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 19 deletions.
8 changes: 5 additions & 3 deletions docs/further.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
# Azure Batch Authentication

The plugin uses [DefaultAzureCredential](https://learn.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) to create and destroy Azure Batch resources. The caller must have Contributor permissions on the Azure Batch account for the plugin to work properly.
The plugin uses [DefaultAzureCredential](https://learn.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) to create and destroy Azure Batch resources. The caller must have Contributor permissions on the Azure Batch account for the plugin to work properly. If you are using the Azure Storage plugin you should also have the Storage Blob Data Contributor role for the storage account(s) you use.

To run a Snakemake workflow using your azure identity you need to ensure you are logged in using the [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/):

```
az login
```

If you are running Snakemake from a GitHub workflow, you can authenticate the GitHub runner [using OIDC with a User-Assigned Managed Identity](https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-azure), and grant that Managed Identity Contributor permissions to the Azure Batch Account.
If you are running Snakemake from a GitHub workflow, you can authenticate the GitHub runner [with a User-Assigned Managed Identity](https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-azure), and grant that Managed Identity Contributor permissions to the Azure Batch Account.

If you are also using the [Snakemake storage plugin for azure](https://snakemake.github.io/snakemake-plugin-catalog/plugins/storage/azure.html), the caller will also need [Storage Blob Data Contributor Role](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/storage#storage-blob-data-contributor) for any storage account you want to read/write data.
When using the [Snakemake storage plugin for azure](https://snakemake.github.io/snakemake-plugin-catalog/plugins/storage/azure.html), or if you have tasks that need access to the Azure Container Registry or other Azure resources, it is required to setup a user assigned managed identity with the executor. The Batch nodes will assume this identity at runtime, and you can grant them permissions to Azure resources using this identity.

The most common role to grant the Managed Identity will be [Storage Blob Data Contributor Role](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/storage#storage-blob-data-contributor) for any storage account you want to read/write data from the Azure Batch nodes.

# Setup

Expand Down
38 changes: 22 additions & 16 deletions snakemake_executor_plugin_azure_batch/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ class ExecutorSettings(ExecutorSettingsBase):
account_url: Optional[str] = field(
default=None,
metadata={
"help": "Azure batch account url.",
"help": "Batch account url: https://<account>.<region>.batch.azure.com",
"required": True,
"env_var": True,
},
Expand Down Expand Up @@ -137,15 +137,18 @@ class ExecutorSettings(ExecutorSettingsBase):
keep_pool: bool = field(
default=False,
metadata={
"help": "Keep the Azure batch pool after the workflow completes.",
"help": "Keep the Azure Batch resources after the workflow finished.",
"required": False,
"env_var": False,
},
)
managed_identity_resource_id: Optional[str] = field(
default=None,
metadata={
"help": "Azure managed identity resource id.",
"help": "Azure Managed Identity resource id. Managed identity is used for"
"authentication of the Azure Batch nodes to other Azure resources. Required"
"if using the Snakemake Azure Storage plugin or if you need access to"
" Azure Container registry from the nodes.",
"required": False,
"env_var": True,
},
Expand All @@ -154,15 +157,17 @@ class ExecutorSettings(ExecutorSettingsBase):
default=None,
repr=False,
metadata={
"help": "Azure managed identity client id.",
"help": "Azure Managed Identity client id.",
"required": False,
"env_var": True,
},
)
node_start_task_sas_url: Optional[str] = field(
node_start_task_url: Optional[str] = field(
default=None,
metadata={
"help": "Azure batch node start task bash script sas url.",
"help": "Azure Batch node start task bash script url."
"This can be any url that hosts your start task bash script. Azure blob SAS"
"urls work nicely here",
"required": False,
"env_var": False,
},
Expand All @@ -178,39 +183,39 @@ class ExecutorSettings(ExecutorSettingsBase):
node_communication_mode: Optional[str] = field(
default=None,
metadata={
"help": "Azure batch node communication mode.",
"help": "Azure Batch node communication mode.",
"required": False,
"env_var": False,
},
)
pool_subnet_id: Optional[str] = field(
default=None,
metadata={
"help": "Azure batch pool subnet id.",
"help": "Azure Batch pool subnet id.",
"required": False,
"env_var": True,
},
)
pool_image_publisher: str = field(
default="microsoft-azure-batch",
metadata={
"help": "Azure batch pool image publisher.",
"help": "Batch pool image publisher.",
"required": False,
"env_var": False,
},
)
pool_image_offer: str = field(
default="ubuntu-server-container",
metadata={
"help": "Azure batch pool image offer.",
"help": "Batch pool image offer.",
"required": False,
"env_var": False,
},
)
pool_image_sku: str = field(
default="20-04-lts",
metadata={
"help": "Azure batch pool image sku.",
"help": "Batch pool image sku.",
"required": False,
"env_var": False,
},
Expand Down Expand Up @@ -259,7 +264,8 @@ class ExecutorSettings(ExecutorSettingsBase):
tasks_per_node: int = field(
default=1,
metadata={
"help": "Azure batch tasks per node.",
"help": "Batch tasks per node. If node count is greater than 1, this option"
"helps optimize the number of tasks each node can handle simultaneously.",
"required": False,
"env_var": False,
},
Expand Down Expand Up @@ -634,8 +640,9 @@ def create_batch_pool(self):
# default to no start task
start_task_conf = None

# if configured use start task bash script from sas url
if self.settings.node_start_task_sas_url is not None:
# if configured use start task bash script from url
# can be SAS url or other accessible url hosting bash script
if self.settings.node_start_task_url is not None:
_SIMPLE_TASK_NAME = "start_task.sh"
start_task_admin = UserIdentity(
auto_user=AutoUserSpecification(
Expand All @@ -648,7 +655,7 @@ def create_batch_pool(self):
resource_files=[
ResourceFile(
file_path=_SIMPLE_TASK_NAME,
http_url=self.settings.node_start_task_sas_url,
http_url=self.settings.node_start_task_url,
)
],
user_identity=start_task_admin,
Expand All @@ -665,7 +672,6 @@ def create_batch_pool(self):
)
)

# default target node count
scale_settings = ScaleSettings(
fixed_scale=FixedScaleSettings(
target_dedicated_nodes=self.settings.pool_node_count
Expand Down

0 comments on commit 7019d75

Please sign in to comment.