Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh compile/run examples in deployment configuration guides. #19920

Merged
merged 3 commits into from
Feb 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 65 additions & 66 deletions docs/website/docs/guides/deployment-configurations/cpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,27 +13,32 @@ IREE supports efficient program execution on CPU devices by using
highly optimized CPU native instruction streams, which are embedded in one of
IREE's deployable formats.

To compile a program for CPU execution, pick one of IREE's supported executable
formats:
To compile a program for CPU execution:

| Executable Format | Description |
| ----------------- | ----------------------------------------------------- |
| embedded ELF | portable, high performance dynamic library |
| system library | platform-specific dynamic library (.so, .dll, etc.) |
| VMVX | reference target |
1. Pick a CPU target supported by LLVM. By default, IREE includes these LLVM
targets:

At runtime, CPU executables can be loaded using one of IREE's CPU HAL drivers:
* X86
* ARM
* AArch64
* RISCV

* `local-task`: asynchronous, multithreaded driver built on IREE's "task"
system
* `local-sync`: synchronous, single-threaded driver that executes work inline
Other targets may work, but in-tree test coverage and performance work is
focused on that list.

2. Pick one of IREE's supported executable formats:

!!! todo
| Executable Format | Description |
| ----------------- | ----------------------------------------------------- |
| Embedded ELF | (Default) Portable, high performance dynamic library |
| System library | Platform-specific dynamic library (.so, .dll, etc.) |
| VMVX | Reference target |

Add IREE's CPU support matrix: what architectures are supported; what
architectures are well optimized; etc.
At runtime, CPU executables can be loaded using one of IREE's CPU HAL devices:

<!-- TODO(??): when to use CPU vs GPU vs other backends -->
* `local-task`: asynchronous, multithreaded device built on IREE's "task"
system
* `local-sync`: synchronous, single-threaded devices that executes work inline

## :octicons-download-16: Prerequisites

Expand All @@ -44,22 +49,17 @@ At runtime, CPU executables can be loaded using one of IREE's CPU HAL drivers:
Python packages are distributed through multiple channels. See the
[Python Bindings](../../reference/bindings/python.md) page for more details.
The core [`iree-base-compiler`](https://pypi.org/project/iree-base-compiler/)
package includes the LLVM-based CPU compiler:
package includes the compiler tools:

--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-compiler-from-release.md"

#### :material-hammer-wrench: Build the compiler from source

Please make sure you have followed the
[Getting started](../../building-from-source/getting-started.md) page to build
IREE for your host platform and the
[Android cross-compilation](../../building-from-source/android.md) or
[iOS cross-compilation](../../building-from-source/ios.md) page if you are cross
compiling for a mobile device. The `llvm-cpu` compiler backend is compiled in by
default on all platforms.

Ensure that the `IREE_TARGET_BACKEND_LLVM_CPU` CMake option is `ON` when
configuring for the host.
IREE for your host platform. The `llvm-cpu` compiler backend is compiled in by
default on all platforms, though you should ensure that the
`IREE_TARGET_BACKEND_LLVM_CPU` CMake option is `ON` when configuring.

!!! tip
`iree-compile` will be built under the `iree-build/tools/` directory. You
Expand All @@ -71,10 +71,14 @@ You will need to get an IREE runtime that supports the local CPU HAL driver,
along with the appropriate executable loaders for your application.

You can check for CPU support by looking for the `local-sync` and `local-task`
drivers:
drivers and devices:

```console hl_lines="5 6"
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-driver-list.md"
```console hl_lines="10-11"
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-driver-list.md:1"
```

```console hl_lines="4-5"
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-device-list-amd.md"
```

#### :octicons-download-16: Download the runtime from a release
Expand All @@ -88,47 +92,49 @@ package includes the local CPU HAL drivers:

#### :material-hammer-wrench: Build the runtime from source

Please make sure you have followed the
[Getting started](../../building-from-source/getting-started.md) page to build
IREE for your host platform and the
[Android cross-compilation](../../building-from-source/android.md) page if you
are cross compiling for Android. The local CPU HAL drivers are compiled in by
default on all platforms.

Ensure that the `IREE_HAL_DRIVER_LOCAL_TASK` and
`IREE_HAL_EXECUTABLE_LOADER_EMBEDDED_ELF` (or other executable loader) CMake
options are `ON` when configuring for the target.
Please make sure you have followed one of the
[Building from source](../../building-from-source/index.md) pages to build
IREE for your target platform. The local CPU HAL drivers and devices are
compiled in by default on all platforms, though you should ensure that the
`IREE_HAL_DRIVER_LOCAL_TASK` and `IREE_HAL_EXECUTABLE_LOADER_EMBEDDED_ELF`
(or other executable loader) CMake options are `ON` when configuring.

## Compile and run a program

With the requirements out of the way, we can now compile a model and run it.

### :octicons-file-code-16: Compile a program

The IREE compiler transforms a model into its final deployable format in many
sequential steps. A model authored with Python in an ML framework should use the
corresponding framework's import tool to convert into a format (i.e.,
[MLIR](https://mlir.llvm.org/)) expected by the IREE compiler first.
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-import-onnx-mobilenet.md"

Using MobileNet v2 as an example, you can download the SavedModel with trained
weights from
[TensorFlow Hub](https://tfhub.dev/google/tf2-preview/mobilenet_v2/classification)
and convert it using IREE's
[TensorFlow importer](../ml-frameworks/tensorflow.md). Then run the following
command to compile with the `llvm-cpu` target:
Then run the following command to compile with the `llvm-cpu` target:

``` shell hl_lines="2"
``` shell hl_lines="2-3"
iree-compile \
--iree-hal-target-backends=llvm-cpu \
mobilenet_iree_input.mlir -o mobilenet_cpu.vmfb
--iree-llvmcpu-target-cpu=host \
mobilenetv2.mlir -o mobilenet_cpu.vmfb
```

!!! tip "Tip - CPU targets"
???+ tip "Tip - Target CPUs and CPU features"

By default, the compiler will use a generic CPU target which will result in
poor performance. A target CPU or target CPU feature set should be selected
using one of these options:

* `--iree-llvmcpu-target-cpu=...`
* `--iree-llvmcpu-target-cpu-features=...`

When not cross compiling, passing `--iree-llvmcpu-target-cpu=host` is
usually sufficient on most devices.

???+ tip "Tip - CPU targets"

The `--iree-llvmcpu-target-triple` flag tells the compiler to generate code
for a specific type of CPU. You can see the list of supported targets with
`iree-compile --iree-llvmcpu-list-targets`, or pass "host" to let LLVM
infer the triple from your host machine (e.g. `x86_64-linux-gnu`).
`iree-compile --iree-llvmcpu-list-targets`, or use the default value of
"host" to let LLVM infer the triple from your host machine
(e.g. `x86_64-linux-gnu`).

```console
$ iree-compile --iree-llvmcpu-list-targets
Expand All @@ -149,28 +155,21 @@ iree-compile \
x86-64 - 64-bit X86: EM64T and AMD64
```

!!! tip "Tip - CPU features"

The `--iree-llvmcpu-target-cpu-features` flag tells the compiler to generate
code using certain CPU "features", like SIMD instruction sets. Like the
target triple, you can pass "host" to this flag to let LLVM infer the
features supported by your host machine.

### :octicons-terminal-16: Run a compiled program

In the build directory, run the following command:
To run the compiled program:

``` shell hl_lines="2"
tools/iree-run-module \
iree-run-module \
--device=local-task \
--module=mobilenet_cpu.vmfb \
--function=predict \
--input="1x224x224x3xf32=0"
--function=torch-jit-export \
--input="1x3x224x224xf32=0"
```

The above assumes the exported function in the model is named as `predict` and
it expects one 224x224 RGB image. We are feeding in an image with all 0 values
here for brevity, see `iree-run-module --help` for the format to specify
The above assumes the exported function in the model is named `torch-jit-export`
and it expects one 224x224 RGB image. We are feeding in an image with all 0
values here for brevity, see `iree-run-module --help` for the format to specify
concrete values.

<!-- TODO(??): measuring performance -->
Expand Down
82 changes: 36 additions & 46 deletions docs/website/docs/guides/deployment-configurations/gpu-cuda.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,16 +52,12 @@ Next you will need to get an IREE runtime that includes the CUDA HAL driver.

You can check for CUDA support by looking for a matching driver and device:

```console hl_lines="3"
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-driver-list.md"
```console hl_lines="8"
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-driver-list.md:1"
```

```console hl_lines="3"
$ iree-run-module --list_devices

cuda://GPU-00000000-1111-2222-3333-444444444444
local-sync://
local-task://
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-device-list-nvidia.md"
```

#### :octicons-download-16: Download the runtime from a release
Expand All @@ -82,69 +78,63 @@ IREE from source, then enable the CUDA HAL driver with the

## Compile and run a program model

With the compiler and runtime ready, we can now compile programs and run them
on GPUs.
With the requirements out of the way, we can now compile a model and run it.

### :octicons-file-code-16: Compile a program

The IREE compiler transforms a model into its final deployable format in many
sequential steps. A model authored with Python in an ML framework should use the
corresponding framework's import tool to convert into a format (i.e.,
[MLIR](https://mlir.llvm.org/)) expected by the IREE compiler first.
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-import-onnx-mobilenet.md"

Using MobileNet v2 as an example, you can download the SavedModel with trained
weights from
[TensorFlow Hub](https://tfhub.dev/google/tf2-preview/mobilenet_v2/classification)
and convert it using IREE's
[TensorFlow importer](../ml-frameworks/tensorflow.md). Then run one of the
following commands to compile:
Then run the following command to compile with the `cuda` target:

```shell hl_lines="2-3"
iree-compile \
--iree-hal-target-backends=cuda \
--iree-cuda-target=<...> \
mobilenet_iree_input.mlir -o mobilenet_cuda.vmfb
mobilenetv2.mlir -o mobilenet_cuda.vmfb
```

Canonically a CUDA target (`iree-cuda-target`) matching the LLVM NVPTX backend
of the form `sm_<arch_number>` is needed to compile towards each GPU
architecture. If no architecture is specified then we will default to `sm_60`.
???+ tip "Tip - CUDA targets"

Canonically a CUDA target (`iree-cuda-target`) matching the LLVM NVPTX
backend of the form `sm_<arch_number>` is needed to compile towards each GPU
architecture. If no architecture is specified then we will default to
`sm_60`.

Here is a table of commonly used architectures:
Here is a table of commonly used architectures:

| CUDA GPU | Target Architecture | Architecture Code Name
| ------------------- | ------------------- | ----------------------
| NVIDIA P100 | `sm_60` | `pascal`
| NVIDIA V100 | `sm_70` | `volta`
| NVIDIA A100 | `sm_80` | `ampere`
| NVIDIA H100 | `sm_90` | `hopper`
| NVIDIA RTX20 series | `sm_75` | `turing`
| NVIDIA RTX30 series | `sm_86` | `ampere`
| NVIDIA RTX40 series | `sm_89` | `ada`
| CUDA GPU | Target Architecture | Architecture Code Name
| ------------------- | ------------------- | ----------------------
| NVIDIA P100 | `sm_60` | `pascal`
| NVIDIA V100 | `sm_70` | `volta`
| NVIDIA A100 | `sm_80` | `ampere`
| NVIDIA H100 | `sm_90` | `hopper`
| NVIDIA RTX20 series | `sm_75` | `turing`
| NVIDIA RTX30 series | `sm_86` | `ampere`
| NVIDIA RTX40 series | `sm_89` | `ada`

In addition to the canonical `sm_<arch_number>` scheme, `iree-cuda-target` also
supports two additonal schemes to make a better developer experience:
In addition to the canonical `sm_<arch_number>` scheme, `iree-cuda-target`
also supports two additonal schemes to make a better developer experience:

* Architecture code names like `volta` or `ampere`
* GPU product names like `a100` or `rtx3090`
* Architecture code names like `volta` or `ampere`
* GPU product names like `a100` or `rtx3090`

These two schemes are translated into the canonical form under the hood.
We add support for common code/product names without aiming to be exhaustive.
If the ones you want are missing, please use the canonical form.
These two schemes are translated into the canonical form under the hood.
We add support for common code/product names without aiming to be exhaustive.
If the ones you want are missing, please use the canonical form.

### :octicons-terminal-16: Run a compiled program

Run the following command:
To run the compiled program:

``` shell hl_lines="2"
iree-run-module \
--device=cuda \
--module=mobilenet_cuda.vmfb \
--function=predict \
--input="1x224x224x3xf32=0"
--function=torch-jit-export \
--input="1x3x224x224xf32=0"
```

The above assumes the exported function in the model is named as `predict` and
it expects one 224x224 RGB image. We are feeding in an image with all 0 values
here for brevity, see `iree-run-module --help` for the format to specify
The above assumes the exported function in the model is named `torch-jit-export`
and it expects one 224x224 RGB image. We are feeding in an image with all 0
values here for brevity, see `iree-run-module --help` for the format to specify
concrete values.
Loading
Loading