Skip to content

Commit

Permalink
Streamlined Jupyter notebooks for Terraform and experiment deployment…
Browse files Browse the repository at this point in the history
… (installing extractor, enabling/disabling autoscaling so that experiments don't end up deployed on the default node pool, removed changing working directory from within the notebook)
  • Loading branch information
kponichtera authored and JMGaljaard committed Sep 26, 2022
1 parent 976ae17 commit 1430cf1
Show file tree
Hide file tree
Showing 2 changed files with 180 additions and 242 deletions.
88 changes: 44 additions & 44 deletions jupyter/experiment_notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@
"* GKE/Kubernetes cluster (see also `terraform/terraform_notebook.ipynb`)\n",
" * 2 nodes pools (default for system & dependencies, experiment pool)\n",
"* Docker image (including dataset, to speed-up starting experiments).\n",
" * Within a BASH shell\n",
" * Within a bash shell\n",
" * Make sure to have the `requirements-cpu.txt` installed (or `requirements-gpu.txt (in a virtual venv/conda environment). You can run `pip3 install -r requirements-cpu.txt`\n",
" * First run the extractor (locally) `python3 -m extractor configs/example_cloud_experiment.json`\n",
" * First run the extractor (locally) `python3 -m fltk extractor configs/example_cloud_experiment.json`\n",
" * This downloads datasets to be included in the docker image.\n",
" * Build the container `DOCKER_BUILDKIT=1 docker build --platform linux/amd64 . --tag gcr.io/\\$PROJECT_ID/fltk`\n",
" * Build the container `DOCKER_BUILDKIT=1 docker build --platform linux/amd64 . --tag gcr.io/$PROJECT_ID/fltk`\n",
" * Push to your gcr.io repository `docker push gcr.io/$PROJECT_ID/fltk`\n",
"\n",
"\n",
Expand All @@ -38,6 +38,9 @@
},
"outputs": [],
"source": [
"##################\n",
"### CHANGE ME! ###\n",
"##################\n",
"PROJECT_ID=\"test-bed-fltk\"\n",
"CLUSTER_NAME=\"fltk-testbed-cluster\"\n",
"DEFAULT_POOL=\"default-node-pool\"\n",
Expand Down Expand Up @@ -71,10 +74,7 @@
"source": [
"# These commands might take a while to complete.\n",
"gcloud container clusters resize $CLUSTER_NAME --node-pool $DEFAULT_POOL \\\n",
" --num-nodes 2 --region us-central1-c --quiet\n",
"\n",
"gcloud container clusters resize $CLUSTER_NAME --node-pool $EXPERIMENT_POOL \\\n",
" --num-nodes 3 --region us-central1-c --quiet"
" --num-nodes 1 --region us-central1-c --quiet"
]
},
{
Expand Down Expand Up @@ -109,14 +109,32 @@
"outputs": [],
"source": [
"# If you want to delete all pytorch trainjobs, uncomment the command below.\n",
"# kubectl delete pytorchjobs.kubeflow.org --all --namespace test\n",
"# kubectl delete pytorchjobs.kubeflow.org --all --namespace test\n",
"\n",
"# If you want to delete all existing configuration map objects in a namespace, run teh command below\n",
"# kubectl delete configmaps --all --namespace test\n",
"\n",
"helm uninstall -n test flearner"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install extractor\n",
"\n",
"Deploy the TensorBoard service and persistent volumes, required for deployment of the orchestrator's chart."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"helm install -n test extractor ../charts/extractor -f ../charts/fltk-values.yaml"
]
},
{
"cell_type": "markdown",
"metadata": {
Expand Down Expand Up @@ -144,26 +162,10 @@
},
"outputs": [],
"source": [
"# Change the directory to a level above, i.e. content root (the git root directory).\n",
"cd ../\n",
"echo $PWD"
"EXPERIMENT_FILE=\"../configs/federated_tasks/example_arrival_config.json\"\n",
"CLUSTER_CONFIG=\"../configs/example_cloud_experiment.json\""
]
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"EXPERIMENT_FILE=\"configs/federated_tasks/example_arrival_config.json\"\n",
"CLUSTER_CONFIG=\"configs/example_arrival_config\""
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"cell_type": "markdown",
"metadata": {
Expand Down Expand Up @@ -199,7 +201,7 @@
"outputs": [],
"source": [
"helm uninstall experiment-orchestrator -n test\n",
"helm install experiment-orchestrator charts/orchestrator --namespace test -f charts/fltk-values.yaml\\\n",
"helm install experiment-orchestrator ../charts/orchestrator --namespace test -f ../charts/fltk-values.yaml \\\n",
" --set-file orchestrator.experiment=$EXPERIMENT_FILE,orchestrator.configuration=$CLUSTER_CONFIG\n"
]
},
Expand Down Expand Up @@ -236,41 +238,39 @@
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"# Wrapping up\n",
"\n",
"To scale down the cluster nodepools, run the cell below. This will scale the node pools down and remove all the experiments deployed (on the cluster).\n",
"\n",
"1. Experiments cannot be restarted.\n",
"2. Experiment logs will not persist deletion.\n"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"# This will remove all information and logs as well.\n",
"kubectl delete pytorchjobs.kubeflow.org --all-namespaces --all\n",
"\n",
"gcloud container clusters resize $CLUSTER_NAME --node-pool $DEFAULT_POOL \\\n",
" --num-nodes 0 --region us-central1-c --quiet\n",
" --num-nodes 0 --region $REGION --quiet\n",
"\n",
"gcloud container clusters resize $CLUSTER_NAME --node-pool $EXPERIMENT_POOL \\\n",
" --num-nodes 0 --region us-central1-c --quiet"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
" --num-nodes 0 --region $REGION --quiet"
]
}
],
"metadata": {
Expand All @@ -289,4 +289,4 @@
},
"nbformat": 4,
"nbformat_minor": 1
}
}
Loading

0 comments on commit 1430cf1

Please sign in to comment.