This folder contains examples to register custom operators into PyTorch as well as register its kernels into ExecuTorch runtime.
Prerequisite: finish the setting up wiki.
Run:
cd executorch
bash examples/portable/custom_ops/test_custom_ops.sh
In order to use custom ops in ExecuTorch AOT flow (EXIR), the first option is to register the custom ops into PyTorch JIT runtime using torch.library
APIs.
We can see the example in custom_ops_1.py
where we try to register my_ops::mul3
and my_ops::mul3_out
. my_ops
is the namespace and it will show up in the way we use the operator like torch.ops.my_ops.mul3.default
. For more information about PyTorch operator, checkout pytorch/torch/_ops.py
.
Notice that we need both functional variant and out variant for custom ops, because EXIR will need to perform memory planning on the out variant my_ops::mul3_out
.
The second option is to register the custom ops into PyTorch JIT runtime using C++ APIs (TORCH_LIBRARY
/TORCH_LIBRARY_IMPL
). This also means we need to write C++ code and it needs to depend on libtorch
.
We added an example in custom_ops_2.cpp
where we implement and register my_ops::mul4
, also custom_ops_2_out.cpp
with an implementation for my_ops::mul4_out
.
By linking them both with libtorch
and executorch
library, we can build a shared library libcustom_ops_aot_lib_2
that can be dynamically loaded by Python environment and then register these ops into PyTorch. This is done by torch.ops.load_library(<path_to_libcustom_ops_aot_lib_2>)
in custom_ops_2.py
.
After the model is exported by EXIR, we need C++ implementations of these custom ops in order to run it. For example, custom_ops_1_out.cpp
is a C++ kernel that can be plugged into the ExecuTorch runtime. Other than that, we also need a way to bind the PyTorch op to this kernel. This binding is specified in custom_ops.yaml
:
- func: my_ops::mul3.out(Tensor input, *, Tensor(a!) output) -> Tensor(a!)
kernels:
- arg_meta: null
kernel_name: custom::mul3_out_impl # sub-namespace native:: is auto-added
For how to write these YAML entries, please refer to kernels/portable/README.md
.
Currently we use Cmake as the build system to link the my_ops::mul3.out
kernel (written in custom_ops_1.cpp
) to the ExecuTorch runtime. See instructions in: examples/portable/custom_ops/test_custom_ops.sh
(test_cmake_custom_op_1).
Note that we have defined a custom op for both my_ops::mul3.out
and my_ops::mul4.out
in custom_ops.yaml
. To reduce binary size, we can choose to only register the operators used in the model. This is done by passing in a list of operators to the gen_oplist
custom rule, for example: --root_ops="my_ops::mul4.out"
.
We then let the custom ops library depend on this target, to only register the ops we want.
For more information about selective build, please refer to selective_build.md
.