Skip to content

Commit

Permalink
Gaudi2 validation (#892)
Browse files Browse the repository at this point in the history
* Update README.md

Added Gaudi2 run instructions on Intel Developer Cloud

Signed-off-by: orionsBeltWest <[email protected]>

* Update README.md with Gaudi2 instructions

Signed-off-by: orionsBeltWest <[email protected]>

* Update README.md with Gaudi2 instructions

Signed-off-by: orionsBeltWest <[email protected]>

* update README as per PR comments

* Updated README with more information on building a Dockerfile

* Update openfl-tutorials/interactive_api/HPU/PyTorch_MedMNIST_2D/README.md

Co-authored-by: Patrick Foley <[email protected]>

* Update openfl-tutorials/interactive_api/HPU/PyTorch_TinyImageNet/README.md

Co-authored-by: Patrick Foley <[email protected]>

* Update openfl-tutorials/interactive_api/HPU/PyTorch_Kvasir_UNet/README.md

Co-authored-by: Patrick Foley <[email protected]>

---------

Signed-off-by: orionsBeltWest <[email protected]>
Co-authored-by: Patrick Foley <[email protected]>
  • Loading branch information
orionsBeltWest and psfoley authored Nov 9, 2023
1 parent 79b8dbc commit d4108fb
Show file tree
Hide file tree
Showing 3 changed files with 260 additions and 17 deletions.
93 changes: 87 additions & 6 deletions openfl-tutorials/interactive_api/HPU/PyTorch_Kvasir_UNet/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,102 @@
#### The name of the file/example that contain HPU adaptations start with "HPU".
For example: PyTorch_Kvasir_UNet.ipynb placed under workspace folder contains the required HPU adaptations.

All the execution steps mention in last section (**V. How to run this tutorial**) remain same for HPU examples but as pre-requisite it needs some additional environment setup and Habana supported package installations which is explained below from **section I to IV**.
All the execution steps mention in last section (**V. How to run this tutorial**) remain same for HPU examples but as pre-requisite it needs some additional environment setup and Habana supported package installations which is explained below from **section I to V**.

**Note:** By default these experiments utilize 1 HPU device

<br/>

## **I. AWS DL1 Instance Setup**
## **I. Intel Developer Cloud Setup**
This example was test on the Intel Developer Cloud utilizing Gaudi2 instance.

For accessing the Gaudi2 instances on the Intel Developer Cloud follow the instructions [here](https://developer.habana.ai/intel-developer-cloud/)

The Gaudi instance in the Intel Developer Cloud comes SynapseAI SW Stack for Gaudi2 installed. Skip sections (**II. , III.***)

Further more our testing was done using the habana based Docker container built using the Dockerfile base discussed below:

Let's create a Dockerfile with the following content and name it Dockerfile_Habana:

```
FROM vault.habana.ai/gaudi-docker/1.10.0/ubuntu20.04/habanalabs/pytorch-installer-2.0.1/latest
ENV HABANA_VISIBLE_DEVICES=all
ENV OMPI_MCA_btl_vader_single_copy_mechanism=none
ENV DEBIAN_FRONTEND="noninteractive" TZ=Etc/UTC
RUN apt-get update && apt-get install -y tzdata bash-completion \
#RUN apt update && apt-get install -y tzdata bash-completion \
python3-pip openssh-server vim git iputils-ping net-tools curl bc gawk \
&& rm -rf /var/lib/apt/lists/*
RUN pip install numpy \
&& pip install jupyterlab \
&& pip install matplotlib \
&& pip install openfl
RUN git clone https://github.com/securefederatedai/openfl.git /root/openfl
WORKDIR /root
```

This base container comes with HPU Pytorch packages already installed. Hence you could skip step: **IV.** below.

Build the above container and then launch it using:

```
export GAUDI_DOCKER_IMAGE="gaudi-docker-ubuntu20.04-openfl"
docker build -t ${GAUDI_DOCKER_IMAGE} -f Dockerfile_Habana .
docker run --net host -id --name openfl_gaudi_run ${GAUDI_DOCKER_IMAGE} bash
```

Then access the container bash shell using:

```
docker exec -it openfl_gaudi_run bash
```

Once inside the container, ensure openfl repo is cloned!

otherwise clone the openfl repo using:

```
git clone https://github.com/securefederatedai/openfl.git
```

Then check if the openfl package is installed

```
pip list | grep openfl
```

if not, then install it using:

```
pip install openfl
```

Then follow instruction in section **V. HPU Adaptations For PyTorch Examples** below.


<br/>

## **II. AWS DL1 Instance Setup**

This example was tested on AWS EC2 instance created by following the instructions mentioned [here](https://docs.habana.ai/en/latest/AWS_EC2_DL1_and_PyTorch_Quick_Start/AWS_EC2_DL1_and_PyTorch_Quick_Start.html) .

Test setup - Habana 1.7 and Ubuntu 20.04

<br/>

## **II. Set Up SynapseAI SW Stack**
## **III. Set Up SynapseAI SW Stack**

- To perform an installation of the full driver and SynapseAI software stack independently on the EC2 instance, run the following command:

Expand All @@ -33,7 +114,7 @@ You can refer the [Habana docs](https://docs.habana.ai/en/latest/Installation_Gu

<br/>

## **III. HPU Pytorch Installation**
## **IV. HPU Pytorch Installation**

For this example make sure to install the PyTorch package provided by Habana. These packages are optimized for Habana Gaudi HPU. Installing public PyTorch packages is not supported.
Habana PyTorch packages consist of:
Expand Down Expand Up @@ -69,7 +150,7 @@ The default virtual environment folder is `$HOME/habanalabs-venv`. To override t

</br>

## **IV. HPU Adaptations For PyTorch Examples**
## **V. HPU Adaptations For PyTorch Examples**

The following set of code additions are required in the workspace notebook to run a model on Habana. The following steps cover Eager and Lazy modes of execution.

Expand Down Expand Up @@ -112,7 +193,7 @@ Refer [getting started with PyTorch](https://www.intel.com/content/www/us/en/dev

<br/>

## **V. How to run this tutorial (without TLC and locally as a simulation):**
## **VI. How to run this tutorial (without TLC and locally as a simulation):**
<br/>

### 0. If you haven't done so already, create a virtual environment, install OpenFL, and upgrade pip:
Expand Down
93 changes: 87 additions & 6 deletions openfl-tutorials/interactive_api/HPU/PyTorch_MedMNIST_2D/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,96 @@ For example: HPU_PyTorch_MedMNIST_2D.ipynb placed under workspace folder contain

<br/>

## **I. AWS DL1 Instance Setup**

## **I. Intel Developer Cloud Setup**
This example was test on the Intel Developer Cloud utilizing Gaudi2 instance.

For accessing the Gaudi2 instances on the Intel Developer Cloud follow the instructions [here](https://developer.habana.ai/intel-developer-cloud/)

The Gaudi instance in the Intel Developer Cloud comes SynapseAI SW Stack for Gaudi2 installed. Skip sections (**II. , III.***)

Further more our testing was done using the habana based Docker container built using the Dockerfile base discussed below:

Let's create a Dockerfile with the following content and name it Dockerfile_Habana:

```
FROM vault.habana.ai/gaudi-docker/1.10.0/ubuntu20.04/habanalabs/pytorch-installer-2.0.1/latest
ENV HABANA_VISIBLE_DEVICES=all
ENV OMPI_MCA_btl_vader_single_copy_mechanism=none
ENV DEBIAN_FRONTEND="noninteractive" TZ=Etc/UTC
RUN apt-get update && apt-get install -y tzdata bash-completion \
#RUN apt update && apt-get install -y tzdata bash-completion \
python3-pip openssh-server vim git iputils-ping net-tools curl bc gawk \
&& rm -rf /var/lib/apt/lists/*
RUN pip install numpy \
&& pip install jupyterlab \
&& pip install matplotlib \
&& pip install openfl
RUN git clone https://github.com/securefederatedai/openfl.git /root/openfl
WORKDIR /root
```

This base container comes with HPU Pytorch packages already installed. Hence you could skip step: **IV.** below.

Build the above container and then launch it using:

```
export GAUDI_DOCKER_IMAGE="gaudi-docker-ubuntu20.04-openfl"
docker build -t ${GAUDI_DOCKER_IMAGE} -f Dockerfile_Habana .
docker run --net host -id --name openfl_gaudi_run ${GAUDI_DOCKER_IMAGE} bash
```

Then access the container bash shell using:

```
docker exec -it openfl_gaudi_run bash
```

Once inside the container, ensure openfl repo is cloned!

otherwise clone the openfl repo using:

```
git clone https://github.com/securefederatedai/openfl.git
```

Then check if the openfl package is installed

```
pip list | grep openfl
```

if not, then install it using:

```
pip install openfl
```

Then follow instruction in section **V. HPU Adaptations For PyTorch Examples** below.

<br/>

## **II. AWS DL1 Instance Setup**

This example was tested on AWS EC2 instance created by following the instructions mentioned [here](https://docs.habana.ai/en/latest/AWS_EC2_DL1_and_PyTorch_Quick_Start/AWS_EC2_DL1_and_PyTorch_Quick_Start.html) .

Test setup - Habana 1.7 and Ubuntu 20.04

<br/>

## **II. Set Up SynapseAI SW Stack**
## **III. Set Up SynapseAI SW Stack**

- To perform an installation of the full driver and SynapseAI software stack independently on the EC2 instance, run the following command:

Expand All @@ -45,7 +126,7 @@ You can refer the [Habana docs](https://docs.habana.ai/en/latest/Installation_Gu

<br/>

## **III. HPU Pytorch Installation**
## **IV. HPU Pytorch Installation**

For this example make sure to install the PyTorch package provided by Habana. These packages are optimized for Habana Gaudi HPU. Installing public PyTorch packages is not supported.
Habana PyTorch packages consist of:
Expand Down Expand Up @@ -81,7 +162,7 @@ The default virtual environment folder is $HOME/habanalabs-venv. To override the

</br>

## **IV. HPU Adaptations For PyTorch Examples**
## **V. HPU Adaptations For PyTorch Examples**

The following set of code additions are required in the workspace notebook to run a model on Habana. The following steps cover Eager and Lazy modes of execution.

Expand Down Expand Up @@ -124,7 +205,7 @@ Refer [getting started with PyTorch](https://www.intel.com/content/www/us/en/dev

<br/>

## **V. How to run this tutorial (without TLC and locally as a simulation):**
## **VI. How to run this tutorial (without TLC and locally as a simulation):**
### 0. If you haven't done so already, create a virtual environment, install OpenFL, and upgrade pip:
- For help with this step, visit the "Install the Package" section of the [OpenFL installation instructions](https://openfl.readthedocs.io/en/latest/install.html#install-the-package).
<br/>
Expand Down Expand Up @@ -180,4 +261,4 @@ jupyter lab Pytorch_MedMNIST_2D.ipynb
- A Jupyter Server URL will appear in your terminal. In your browser, proceed to that link. Once the webpage loads, click on the Pytorch_MedMNIST_2D.ipynb file.
- To run the experiment, select the icon that looks like two triangles to "Restart Kernel and Run All Cells".
- You will notice activity in your terminals as the experiments runs, and when the experiment is finished the director terminal will display a message that the experiment was finished successfully.


Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,96 @@ For example: HPU_pytorch_tinyimagenet.ipynb placed under workspace folder contai

<br/>

## **I. AWS DL1 Instance Setup**

## **I. Intel Developer Cloud Setup**
This example was test on the Intel Developer Cloud utilizing Gaudi2 instance.

For accessing the Gaudi2 instances on the Intel Developer Cloud follow the instructions [here](https://developer.habana.ai/intel-developer-cloud/)

The Gaudi instance in the Intel Developer Cloud comes SynapseAI SW Stack for Gaudi2 installed. Skip sections (**II. , III.***)

Further more our testing was done using the habana based Docker container built using the Dockerfile base discussed below:

Let's create a Dockerfile with the following content and name it Dockerfile_Habana:

```
FROM vault.habana.ai/gaudi-docker/1.10.0/ubuntu20.04/habanalabs/pytorch-installer-2.0.1/latest
ENV HABANA_VISIBLE_DEVICES=all
ENV OMPI_MCA_btl_vader_single_copy_mechanism=none
ENV DEBIAN_FRONTEND="noninteractive" TZ=Etc/UTC
RUN apt-get update && apt-get install -y tzdata bash-completion \
#RUN apt update && apt-get install -y tzdata bash-completion \
python3-pip openssh-server vim git iputils-ping net-tools curl bc gawk \
&& rm -rf /var/lib/apt/lists/*
RUN pip install numpy \
&& pip install jupyterlab \
&& pip install matplotlib \
&& pip install openfl
RUN git clone https://github.com/securefederatedai/openfl.git /root/openfl
WORKDIR /root
```

This base container comes with HPU Pytorch packages already installed. Hence you could skip step: **IV.** below.

Build the above container and then launch it using:

```
export GAUDI_DOCKER_IMAGE="gaudi-docker-ubuntu20.04-openfl"
docker build -t ${GAUDI_DOCKER_IMAGE} -f Dockerfile_Habana .
docker run --net host -id --name openfl_gaudi_run ${GAUDI_DOCKER_IMAGE} bash
```

Then access the container bash shell using:

```
docker exec -it openfl_gaudi_run bash
```

Once inside the container, ensure openfl repo is cloned!

otherwise clone the openfl repo using:

```
git clone https://github.com/securefederatedai/openfl.git
```

Then check if the openfl package is installed

```
pip list | grep openfl
```

if not, then install it using:

```
pip install openfl
```

Then follow instruction in section **V. HPU Adaptations For PyTorch Examples** below.

<br/>

## **II. AWS DL1 Instance Setup**

This example was tested on AWS EC2 instance created by following the instructions mentioned [here](https://docs.habana.ai/en/latest/AWS_EC2_DL1_and_PyTorch_Quick_Start/AWS_EC2_DL1_and_PyTorch_Quick_Start.html) .

Test setup - Habana 1.7 and Ubuntu 20.04

<br/>

## **II. Set Up SynapseAI SW Stack**
## **III. Set Up SynapseAI SW Stack**

- To perform an installation of the full driver and SynapseAI software stack independently on the EC2 instance, run the following command:

Expand All @@ -33,7 +114,7 @@ You can refer the [Habana docs](https://docs.habana.ai/en/latest/Installation_Gu

<br/>

## **III. HPU Pytorch Installation**
## **IV. HPU Pytorch Installation**

For this example make sure to install the PyTorch package provided by Habana. These packages are optimized for Habana Gaudi HPU. Installing public PyTorch packages is not supported.
Habana PyTorch packages consist of:
Expand Down Expand Up @@ -69,7 +150,7 @@ The default virtual environment folder is $HOME/habanalabs-venv. To override the

</br>

## **IV. HPU Adaptations For PyTorch Examples**
## **V. HPU Adaptations For PyTorch Examples**

The following set of code additions are required in the workspace notebook to run a model on Habana. The following steps cover Eager and Lazy modes of execution.

Expand Down Expand Up @@ -113,7 +194,7 @@ Refer [getting started with PyTorch](https://www.intel.com/content/www/us/en/dev
<br/>


## **V. How to run this tutorial (without TLS and locally as a simulation):**
## **VI. How to run this tutorial (without TLS and locally as a simulation):**
<br/>

### 0. If you haven't done so already, install OpenFL in the virtual environment created during Habana setup, and upgrade pip:
Expand Down

0 comments on commit d4108fb

Please sign in to comment.