-
Notifications
You must be signed in to change notification settings - Fork 371
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
196 additions
and
198 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,197 +1,196 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "44a2ab29", | ||
"metadata": {}, | ||
"source": [ | ||
"# Generating GCG Suffixes Using Azure Machine Learning" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "4c2b3f55", | ||
"metadata": {}, | ||
"source": [ | ||
"This notebook shows how to generate GCG suffixes using Azure Machine Learning (AML), which consists of three main steps:\n", | ||
"1. Connect to an Azure Machine Learning (AML) workspace.\n", | ||
"2. Create AML Environment with the Python dependencies.\n", | ||
"3. Submit a training job to AML." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "697c48fe", | ||
"metadata": {}, | ||
"source": [ | ||
"## Connect to Azure Machine Learning Workspace" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "632bef8b", | ||
"metadata": {}, | ||
"source": [ | ||
"The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning (AML), providing a centralized place to work with all the artifacts you create when using AML. In this section, we will connect to the workspace in which the job will be run.\n", | ||
"\n", | ||
"To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ai.ml` to get a handle to the required AML workspace. We use the [default Azure authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for this tutorial." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "8645ef34", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"from pyrit.common import default_values\n", | ||
"\n", | ||
"default_values.load_default_env()\n", | ||
"\n", | ||
"# Enter details of your AML workspace\n", | ||
"subscription_id = os.environ.get(\"AZURE_SUBSCRIPTION_ID\")\n", | ||
"resource_group = os.environ.get(\"AZURE_RESOURCE_GROUP\")\n", | ||
"workspace = os.environ.get(\"AZURE_ML_WORKSPACE_NAME\")\n", | ||
"compute_name = os.environ.get(\"AZURE_ML_COMPUTE_NAME\")\n", | ||
"print(workspace)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "37b282e7", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from azure.ai.ml import MLClient\n", | ||
"from azure.identity import DefaultAzureCredential\n", | ||
"\n", | ||
"# Get a handle to the workspace\n", | ||
"ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "7d7c5c41", | ||
"metadata": {}, | ||
"source": [ | ||
"## Create Compute Cluster\n", | ||
"\n", | ||
"Before proceeding, create a compute cluster in Azure ML. The following command may be useful:\n", | ||
"az ml compute create --size Standard_ND96isrf_H100_v5 --type AmlCompute --name <compute-name> -g <group> -w <workspace> --min-instances 0" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "e390439f", | ||
"metadata": {}, | ||
"source": [ | ||
"## Create AML Environment" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "ceedcfa7", | ||
"metadata": {}, | ||
"source": [ | ||
"To install the dependencies needed to run GCG, we create an AML environment from a [Dockerfile](../../../pyrit/auxiliary_attacks/gcg/src/Dockerfile)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "67783454", | ||
"metadata": { | ||
"lines_to_next_cell": 2 | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from pathlib import Path\n", | ||
"from pyrit.common.path import HOME_PATH\n", | ||
"from azure.ai.ml.entities import Environment, BuildContext\n", | ||
"\n", | ||
"# Configure the AML environment with path to Dockerfile and dependencies\n", | ||
"env_docker_context = Environment(\n", | ||
" build=BuildContext(path=Path(HOME_PATH) / \"pyrit\" / \"auxiliary_attacks\" / \"gcg\" / \"src\"),\n", | ||
" name=\"pyrit\",\n", | ||
" description=\"PyRIT environment created from a Docker context.\",\n", | ||
")\n", | ||
"\n", | ||
"# Create or update the AML environment\n", | ||
"ml_client.environments.create_or_update(env_docker_context)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "49d20c8e", | ||
"metadata": {}, | ||
"source": [ | ||
"## Submit Training Job to AML" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "ad33e623", | ||
"metadata": {}, | ||
"source": [ | ||
"Finally, we configure the command to run the GCG algorithm. The entry file for the algorithm is [`run.py`](../../../pyrit/auxiliary_attacks/gcg/experiments/run.py), which takes several command line arguments, as shown below. We also have to specify the compute `instance_type` to run the algorithm on. In our experience, a GPU instance with at least 32GB of vRAM is required. In the example below, we use Standard_ND40rs_v2.\n", | ||
"\n", | ||
"Depending on the compute instance you use, you may encounter \"out of memory\" errors. In this case, we recommend training on a smaller model or lowering `n_train_data` or `batch_size`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "938a1030", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from azure.ai.ml import command\n", | ||
"from azure.ai.ml.entities import JobResourceConfiguration\n", | ||
"\n", | ||
"# Configure the command\n", | ||
"job = command(\n", | ||
" code=Path(HOME_PATH),\n", | ||
" command=\"cd pyrit/auxiliary_attacks/gcg/experiments && python run.py --model_name ${{inputs.model_name}} --setup ${{inputs.setup}} --n_train_data ${{inputs.n_train_data}} --n_test_data ${{inputs.n_test_data}} --n_steps ${{inputs.n_steps}} --batch_size ${{inputs.batch_size}}\",\n", | ||
" inputs={\n", | ||
" \"model_name\": \"phi_3_mini\",\n", | ||
" \"setup\": \"multiple\",\n", | ||
" \"n_train_data\": 25,\n", | ||
" \"n_test_data\": 0,\n", | ||
" \"n_steps\": 500,\n", | ||
" \"batch_size\": 256,\n", | ||
" },\n", | ||
" environment=f\"{env_docker_context.name}:{env_docker_context.version}\",\n", | ||
" environment_variables={\"HF_TOKEN\": os.environ[\"HF_TOKEN\"]},\n", | ||
" display_name=\"suffix_generation\",\n", | ||
" description=\"Generate a suffix for attacking LLMs.\",\n", | ||
" compute=compute_name,\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b40591b0", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Submit the command\n", | ||
"returned_job = ml_client.create_or_update(job)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"jupytext": { | ||
"cell_metadata_filter": "-all" | ||
}, | ||
"kernelspec": { | ||
"display_name": "pyrit-kernel", | ||
"language": "python", | ||
"name": "pyrit-kernel" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "44a2ab29", | ||
"metadata": {}, | ||
"source": [ | ||
"# Generating GCG Suffixes Using Azure Machine Learning" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "4c2b3f55", | ||
"metadata": {}, | ||
"source": [ | ||
"This notebook shows how to generate GCG suffixes using Azure Machine Learning (AML), which consists of three main steps:\n", | ||
"1. Connect to an Azure Machine Learning (AML) workspace.\n", | ||
"2. Create AML Environment with the Python dependencies.\n", | ||
"3. Submit a training job to AML." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "697c48fe", | ||
"metadata": {}, | ||
"source": [ | ||
"## Connect to Azure Machine Learning Workspace" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "632bef8b", | ||
"metadata": {}, | ||
"source": [ | ||
"The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning (AML), providing a centralized place to work with all the artifacts you create when using AML. In this section, we will connect to the workspace in which the job will be run.\n", | ||
"\n", | ||
"To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ai.ml` to get a handle to the required AML workspace. We use the [default Azure authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for this tutorial." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "8645ef34", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"from pyrit.common import default_values\n", | ||
"\n", | ||
"default_values.load_default_env()\n", | ||
"\n", | ||
"# Enter details of your AML workspace\n", | ||
"subscription_id = os.environ.get(\"AZURE_SUBSCRIPTION_ID\")\n", | ||
"resource_group = os.environ.get(\"AZURE_RESOURCE_GROUP\")\n", | ||
"workspace = os.environ.get(\"AZURE_ML_WORKSPACE_NAME\")\n", | ||
"compute_name = os.environ.get(\"AZURE_ML_COMPUTE_NAME\")\n", | ||
"print(workspace)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "37b282e7", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from azure.ai.ml import MLClient\n", | ||
"from azure.identity import DefaultAzureCredential\n", | ||
"\n", | ||
"# Get a handle to the workspace\n", | ||
"ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "7d7c5c41", | ||
"metadata": {}, | ||
"source": [ | ||
"## Create Compute Cluster\n", | ||
"\n", | ||
"Before proceeding, create a compute cluster in Azure ML. The following command may be useful:\n", | ||
"az ml compute create --size Standard_ND96isrf_H100_v5 --type AmlCompute --name <compute-name> -g <group> -w <workspace> --min-instances 0" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "e390439f", | ||
"metadata": {}, | ||
"source": [ | ||
"## Create AML Environment" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "ceedcfa7", | ||
"metadata": {}, | ||
"source": [ | ||
"To install the dependencies needed to run GCG, we create an AML environment from a [Dockerfile](../../../pyrit/auxiliary_attacks/gcg/src/Dockerfile)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "67783454", | ||
"metadata": { | ||
"lines_to_next_cell": 2 | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from pathlib import Path\n", | ||
"from pyrit.common.path import HOME_PATH\n", | ||
"from azure.ai.ml.entities import Environment, BuildContext\n", | ||
"\n", | ||
"# Configure the AML environment with path to Dockerfile and dependencies\n", | ||
"env_docker_context = Environment(\n", | ||
" build=BuildContext(path=Path(HOME_PATH) / \"pyrit\" / \"auxiliary_attacks\" / \"gcg\" / \"src\"),\n", | ||
" name=\"pyrit\",\n", | ||
" description=\"PyRIT environment created from a Docker context.\",\n", | ||
")\n", | ||
"\n", | ||
"# Create or update the AML environment\n", | ||
"ml_client.environments.create_or_update(env_docker_context)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "49d20c8e", | ||
"metadata": {}, | ||
"source": [ | ||
"## Submit Training Job to AML" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "ad33e623", | ||
"metadata": {}, | ||
"source": [ | ||
"Finally, we configure the command to run the GCG algorithm. The entry file for the algorithm is [`run.py`](../../../pyrit/auxiliary_attacks/gcg/experiments/run.py), which takes several command line arguments, as shown below. We also have to specify the compute `instance_type` to run the algorithm on. In our experience, a GPU instance with at least 32GB of vRAM is required. In the example below, we use Standard_ND40rs_v2.\n", | ||
"\n", | ||
"Depending on the compute instance you use, you may encounter \"out of memory\" errors. In this case, we recommend training on a smaller model or lowering `n_train_data` or `batch_size`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "938a1030", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from azure.ai.ml import command\n", | ||
"\n", | ||
"# Configure the command\n", | ||
"job = command(\n", | ||
" code=Path(HOME_PATH),\n", | ||
" command=\"cd pyrit/auxiliary_attacks/gcg/experiments && python run.py --model_name ${{inputs.model_name}} --setup ${{inputs.setup}} --n_train_data ${{inputs.n_train_data}} --n_test_data ${{inputs.n_test_data}} --n_steps ${{inputs.n_steps}} --batch_size ${{inputs.batch_size}}\",\n", | ||
" inputs={\n", | ||
" \"model_name\": \"phi_3_mini\",\n", | ||
" \"setup\": \"multiple\",\n", | ||
" \"n_train_data\": 25,\n", | ||
" \"n_test_data\": 0,\n", | ||
" \"n_steps\": 500,\n", | ||
" \"batch_size\": 256,\n", | ||
" },\n", | ||
" environment=f\"{env_docker_context.name}:{env_docker_context.version}\",\n", | ||
" environment_variables={\"HF_TOKEN\": os.environ[\"HF_TOKEN\"]},\n", | ||
" display_name=\"suffix_generation\",\n", | ||
" description=\"Generate a suffix for attacking LLMs.\",\n", | ||
" compute=compute_name,\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b40591b0", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Submit the command\n", | ||
"returned_job = ml_client.create_or_update(job)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"jupytext": { | ||
"cell_metadata_filter": "-all" | ||
}, | ||
"kernelspec": { | ||
"display_name": "pyrit-kernel", | ||
"language": "python", | ||
"name": "pyrit-kernel" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.