forked from iree-org/iree
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move GeneralizeNamedOp pass #72
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Collaborator
PhaneeshB
commented
Nov 25, 2023
- moves SPRIVGeneralizeNamedOp Pass to Common/GPU
- Use it in LLVMGPU Pipeline
) Co-authored-by: Elias Joseph <[email protected]>
The conversion pass is enabled with `--iree-flow-enable-conv-nchw-to-nhwc-transform` Includes partial support for propagating and cancelling transposes generated when converting from nchw to nhwc. The high level strategy for this pass is as follows: 1. Do the conversions for all conv_nchw_fchw ops (and pooling ops) and wrap the converted convolutions in transposes. Each transpose is tagged to indicate which direction the transpose should propagate through the graph. 2. Traverse the ops in the function in reverse to propagate transposes marked for upwards propagation to their parents. Ideally just before ops such as arith.constant or function arguments. 3. Propagate the transposes marked for downward propagation to its users, ideally to just before return. 4. Canonicalize out all adjacent cancelling transposes and generalize the remaining transposes to allow for fusing them with nearby ops.
- Speedup filter transform folding - Add points for 4x4, switch to that tile size - Move winograd after im2col + padding, in im2col do not touch conv if it has been marked as winograd -remove prints/chrono and adjust Attribute rawKernelAttr for windows by Quinn Co-authored-by: Quinn Dawkins <[email protected]>
Add pass to insert markers for function bisecting. Add pass to outline marked operation ranges into separate functions.
Expose a Python binding that has extraction of an operation list from an MLIR file. This list is then used to execute with IREE the entry MLIR while resolving calls to functions in other MLIR files.
This can be useful when trying to do layout propagation and guaranteeing specific fusion at time (use with caution).
This pass is the spiritual successor to `convert-conv-nchw-to-nhwc` focused on generalizing to enable data tiling and more robust layout propagation, as well as supporting non-named convolutions as well. Currently this includes some baked in generalization patterns and does not support padding. Tile size selection currently is pass-wide, but there is limited attribute control to enable fully transposing. Further generalizations should aim to write this pass by allowing per-op tile size control.
<32 bit width types are handled on the SPIR-V side by introducing bitcasts to and from i32 and bubbling them to the center of the kernel hoping to cancel. This adds a pattern for a bitcast on the result of an scf.if, which comes from the way that padding is handled (transfer_read in the `then` branch, else yield a splat constant).
Build Experimental ROCM builds
Use flags -DIREE_BUILD_EXPERIMENTAL_LEVEL_ZERO=ON -DLEVEL_ZERO_HEADERS_API_ROOT=/home/stanley/nod/level-zero -LevelZero HAL Driver -Addi OpenCL HAL Target compiler -Add SPIRV Codegen for Kernel capability -fix illegal pointer arithmetic on void* by Boian -Use events for command buffer execution and synchronization (iree-org#47) -Add flag for switching between Physical64 and Physical32 addressing in OpenCL -enable creation of device by UUID + ID(device handle as uintptr_t) Note that the device ID implemented like that is ephemeral and is valid only in the current IREE runtime context. If you start a new process the IDs will be different. With this change you can do $ iree-run-module --list_devices level_zero://00005100-0000-0000-0000-000000000001 $ iree-run-module --device=level_zero://00005100-0000-0000-0000-000000000001 ... -add query_memory_heaps implementation Fixes error: arithmetic on a pointer to void is a GNU extension [-Werror,-Wgnu-pointer-arith] -Supply structure type when passing such arguments in accordance with the API (iree-org#34) When calling Level Zero API functions that query information and use a struct to populate the information, the user must supply the structure type (stype) in the structure itself. The struct is both in and out argument. Fix Level Zero build Remove usage of iree_hal_command_buffer_dyn_cast.
* Add rudimentary non-production distributed Python API * Distributed execution validation Add functionality that validates distributed StableHLO is producing the same results as non-distributed. * Add execution time measurement * Distributed Python API: add call_count to run_ranks * Add setup script for distributed Python API * Add JAX to install setup --------- Co-authored-by: Boian Petkantchin <[email protected]>
…f-hosted, clean macos bindist Drop instrumented builds and Python < 3.11 Add Upstream sync CI This fixes the problem of potentially dropping commits that have been submitted while an automatic rebase with upstream IREE is goining on. [CI] Fix macos clean up logic Fixes the macos builder.
Instead of requiring exact NCCL version, relax constraints to the standard ABI versioning rules, namely found_version >= major.minor && found_version < major + 1, where major and minor are from the NCCL headers we use.
Makes the driver compliant with the HAL API change.
This reverts commit a6512dc.
The semantics for specifying different kinds of advice is unclear so I set it in two stages.
…uped qmm MegaPR [LLVMCPU] Allow parallel tiling in LLVMCPUSplitReduction, tile reduction by 2 This commit enables tiling of parallel dimensions in LLVMCPUSplitReduction, as well as changing the tile size of the resulting reduction to 2. The latter change is an x86 specific optimization that allows targeting specific instructions through VectorContractCustomKernels. [LLVMCPU] Add support for vecmat cases in VectorContractCustomKernel This commit introduces some new functionality to VectorContractCustomKernels: 1. Matching for vecmat kernels that have 1D vector shapes 2. Support for `vector.contract` ops with split reduction dimensions 3. Ability to allow promoting smaller bitwidth inputs with `arith.extui` or `arith.extsi` before passing into the `llvm.inline_asm` op 4. Ability to specify explicit constraint strings per register input in a VectorContractCustomKernel 5. Support for `i4` and `i8` input types 6. New x86 AVX512VNNI i16xi16->i32 vecmat kernel with split reduction This commit also adds `vector.transfer_read` flattening patterns and VectorContractCustomKernel lowering patterns to LLVMCPUVectorLowering. [LLVMCPU] Add pass to breakdown subbyte `arith.extui` This pass breaks down `arith.extui` ops that have `i4` inputs into a sequence of `vector.shuffle->arith.andi->arith.shrui`. This avoids bad lowering of subbyte extends in x86 backend. This pass is somewhat specific to some work on vecmat VectorContractCustomKernels right now, and has some unique matchings. The pass also attempts to make use of AVX512 registers, so the vector size for the resulting IR is hardcoded as 512 bits. This needs to change before landing. This pass in general needs some refactoring before landing. [LLVMCPU] Add pass to fold away unit dimensions on `vector.contract` ops This pass folds away unit dimensions on `vector.contract` ops to get these ops into a form that is recognizable by the VectorContractCustomKernels patterns. This pass also hoists `vector.shape_cast` ops out of containing `scf.for` ops if possible when the shape cast operates on the accumulator of a `vector.contract` op. This pattern may be better off somewhere else, but for now it is here because the unit dim folding pattern can produce a hoistable `vector.shape_cast` op in cases with split reduction. [LLVMCPU] Add flag to restrict reassociated quantized matmul optimizations [LLVMCPU] Add additional Memref alias foldings [LLVMCPU] Simplify VectorContractCustomKernels x86 constraint codes, add new AVX512 kernel
Co-authored-by: Max Dawkins <[email protected]>
1) Level Zero is probably not buildable. Registration CMake for external drivers changed and I didn't take the time to figure out how that works. 2) Some github action changes; might have messed up the workflows, unsure. 3) CPU Performance patches had a number of conflicts, including the FuseDequantMatmul pass seemingly being dropped. Will need help from Max to make sure they are all in order.
This commit adds a new tiling configuration pass in LLVMCPU. This pass sets a special tiling configuration for reassociated quantized matmuls, since the non-root op of these dispatches require specific tiling to target certain x86 instructions. This pass is a place to set abnormal tile sizes on non-root ops for specific types of workloads.
antiagainst
requested changes
Nov 25, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason this cannot be done directly in upstream IREE? Could you send pull request there? Tests also needs to be moved too.
881690d
to
ee88b49
Compare
merged upstream and rebased |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.