Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generation of dialect bindings with TableGen and proc macro #262

Closed
2 of 7 tasks
Danacus opened this issue Jul 23, 2023 · 11 comments
Closed
2 of 7 tasks

Generation of dialect bindings with TableGen and proc macro #262

Danacus opened this issue Jul 23, 2023 · 11 comments
Labels
enhancement New feature or request

Comments

@Danacus
Copy link
Contributor

Danacus commented Jul 23, 2023

The Python bindings for MLIR dialects are automatically generated using a TableGen backend (see OpPythonBindingGen). This made me wonder if this could be done for Rust/melior.

Just for fun, I implemented a TableGen backend as a proc macro crate.

  • I ported old tablegen-rs to modern LLVM and created a wrapper crate: tblgen-rs
  • And created a proc macro crate to generate melior bindings from TableGen ODS: dialectgen-rs, which can be used together with melior (the code is pure chaos, I wrote it in a few days, it's a proof of concept).

When calling dialect!("MyDialects.td");, where MyDialects.td is a file containing

include "mlir/Dialect/Arith/IR/ArithOps.td"
include "mlir/Dialect/Func/IR/FuncOps.td"
include "mlir/Dialect/ControlFlow/IR/ControlFlowOps.td"
include "mlir/Dialect/Index/IR/IndexOps.td"
include "mlir/Dialect/LLVMIR/LLVMOps.td"
include "mlir/Dialect/MemRef/IR/MemRefOps.td"
include "mlir/Dialect/SCF/IR/SCFOps.td"
include "mlir/Dialect/PDL/IR/PDLOps.td"
include "mlir/Dialect/Math/IR/MathOps.td"
include "mlir/Dialect/GPU/IR/GPUOps.td"
include "mlir/Dialect/Linalg/IR/LinalgOps.td"
include "mlir/Dialect/Async/IR/AsyncOps.td"
include "mlir/Dialect/Quant/QuantOps.td"
include "mlir/Dialect/Shape/IR/ShapeOps.td"
include "mlir/Dialect/Tensor/IR/TensorOps.td"

it generates the following:

  • A newtype struct for each operation, e.g.
pub struct FuncOp<'c> {
        operation: ::melior::ir::operation::Operation<'c>,
}
  • Getters and setters for operands, results and attributes specified in the dialect, e.g.
        pub fn sym_name<'a>(&'a self) -> ::melior::ir::attribute::StringAttribute<'c> {
            self.operation
                .attribute("sym_name")
                .expect("operation should have attribute sym_name")
                .try_into()
                .expect("sym_name should be a ::melior::ir::attribute::StringAttribute")
        }
        pub fn set_sym_name(
            &mut self,
            value: ::melior::ir::attribute::StringAttribute<'c>,
        ) {
            self.operation.set_attribute("sym_name", value);
        }
  • A typestate builder, which is a wrapper around OperationBuilder for each Op that ensures all required operands, results and attributes are set at compile time before allowing you to build the Op, e.g.
#[doc(hidden)]
    pub struct ReturnOpOperands;
    #[doc(hidden)]
    pub struct ReturnOpNoOperands;
    pub struct ReturnOpBuilder<'c, Toperands> {
        #[doc(hidden)]
        builder: ::melior::ir::operation::OperationBuilder<'c>,
        #[doc(hidden)]
        context: &'c ::melior::Context,
        #[doc(hidden)]
        _operands: ::std::marker::PhantomData<Toperands>,
    }
    impl<'c, 'a> ReturnOpBuilder<'c, ReturnOpNoOperands> {
        pub fn new(
            context: &'c ::melior::Context,
            location: ::melior::ir::Location<'c>,
        ) -> Self {
            Self {
                context,
                builder: ::melior::ir::operation::OperationBuilder::new(
                    "func.return",
                    location,
                ),
                _operands: ::std::marker::PhantomData,
            }
        }
    }
    impl<'c, 'a> ReturnOpBuilder<'c, ReturnOpNoOperands> {
        pub fn operands(
            mut self,
            operands: &::melior::ir::Value<'c, 'a>,
        ) -> ReturnOpBuilder<'c, ReturnOpOperands> {
            self.builder = self.builder.add_operand(operands);
            let Self { context, mut builder, _operands } = self;
            ReturnOpBuilder {
                context,
                builder,
                _operands: ::std::marker::PhantomData,
            }
        }
    }
    impl<'c, 'a> ReturnOpBuilder<'c, ReturnOpOperands> {
        pub fn build(self) -> ReturnOp<'c> {
            self.builder.build().try_into().expect("should be a valid ReturnOp")
        }
    }
  • A default constructor similar to those provided by melior
    pub fn r#return<'c, 'a: 'c>(
        context: &'c Context,
        operands: &::melior::ir::Value<'c, 'a>,
        location: Location<'c>,
    ) -> ReturnOp<'c> {
        ReturnOp::builder(context, location).operands(operands).build()
    }

Although I very much enjoyed writing this, I do realize that this might be a bit absurd (the macro generates around 100000 LoC) and I don't know how useful this will be. What are your thoughts?

Remaining work

  • Refactor codes in the melior-macro crate.
  • Prefer Result's rather than panics.
  • Remove or use unused codes in *Constraint types.
  • Add an OperationLike trait and implement it for Module and dialect-specific operations.
  • Write unit tests for representative operations.
    • Generate unit tests for each dialect-specific operation and builder functions?
  • Support VariadicOfVariadic: argument should be &[&[Value]] instead of &[Value] and segment sizes attribute should be derived from this, see this MLIR diff (low priority issue).
  • Move some common functionality (e.g. segment computation) to separate function to reduce code size?
@raviqqe raviqqe added the enhancement New feature or request label Jul 24, 2023
@raviqqe
Copy link
Member

raviqqe commented Jul 24, 2023

I think it's a good way to go! That would remove the future toil to maintain each dialect set. Thank you so much for PoC!

BTW, I can't see the repos. Are their URLs correct?

@Danacus
Copy link
Contributor Author

Danacus commented Jul 24, 2023

That would remove the future toil to maintain each dialect set.

Yes, and as a bonus it would allow users of melior to generate bindings for any custom out-of-tree dialects without needing to do it manually.

BTW, I can't see the repos. Are their URLs correct?

I'm sorry, I forgot to set them to public visibility. They should be visible now.

@raviqqe
Copy link
Member

raviqqe commented Jul 24, 2023

The code looks good to me while we might be able to reduce the size of the macro-expanded codes in some way in the future. If you can make a PR, it's more than welcome to merge it!

So, do you think the primary reason for the code bloat is due to the type state builder? The builder's code size for each operation is just O(n) where n is a number of its fields, right? If that's the case, I think the large code size is inevitable anyway unless we check the existence of fields dynamically with runtime overhead. How long does it take to compile all dialect TableGen files right now?

@Danacus
Copy link
Contributor Author

Danacus commented Jul 24, 2023

I don't think there's any major code bloat. It's mostly due to the fact that there are a lot of ops if you include all dialects, combined with the fact that I chose very long type names for the type state builders to avoid conflicting names. It's indeed inevitable, and it doesn't really affect the size of the compiled binary since generic types are erased and only the code being used is included anyway. I think it's better than adding runtime overhead with dynamic checks.

I haven't measured compile time yet, but I think it's still under one second for the dialect files mentioned above. I believe most of the compile time is still spent linking with MLIR when compiling a binary crate, which already took a long time before.

Do you think this is something that should be merged with melior, or if it would be better to keep it separate? Either way I still need to clean up the code a bit, I'm not completely satisfied with it yet.

Also, we would need to merge a few minor additions in melior, since I added some getters and setters to Operation and a few add_ functions to OperationBuilder. Perhaps I should make a PR for this first?

@raviqqe
Copy link
Member

raviqqe commented Jul 24, 2023

Do you think this is something that should be merged with melior, or if it would be better to keep it separate? Either way I still need to clean up the code a bit, I'm not completely satisfied with it yet.

I think it's good to include it in Melior given your observation. But it's also welcome to maintain it as a separate crate as an extension too.

Also, we would need to merge a few minor additions in melior, since I added some getters and setters to Operation and a few add_ functions to OperationBuilder. Perhaps I should make a PR for this first?

Sure! We can merge those changes first.

@Danacus
Copy link
Contributor Author

Danacus commented Jul 24, 2023

I did a cargo build --timings on a binary crate that includes many dialects using dialectgen: cargo-timing-20230724T094547Z.html.zip.

Actually generating the dialect bindings takes less than a second as I suspected, but a lot of time is spent compiling TableGen bindings (around 7 seconds, they are compiled from C and linked with LLVM and I think linking takes a lot of time). The largest chunk of time is still spent linking the final binary with MLIR, which is inevitable.

@raviqqe
Copy link
Member

raviqqe commented Jul 24, 2023

Thank you for experimenting with it! Then, we can just neglect the minimal overhead.

raviqqe pushed a commit that referenced this issue Aug 17, 2023
I've created an early draft of dialect generation based on the [MLIR
Python
Bindings](https://mlir.llvm.org/docs/Bindings/Python/#integration-with-ods).

It's okay if you don't have the time to review this. I admit that
there's quite a lot of ugly and hard to read code (especially the code
dealing with variadic arguments), and it might not be in a state
currently where you would want to maintain it within `melior`, hence I'm
marking it as a draft.

Not much changed since my original implementation mentioned in #262, but
I got a bit stuck on error handling in my TableGen wrapper library
([tblgen-rs](https://gitlab.com/Danacus/tblgen-rs)) and lost motivation
for a while after that. Anyway, I decided to finish what I started,
hence I am making this draft.

To build this branch, you might need to set `TABLEGEN_160_PREFIX` to
match `MLIR_SYS_160_PREFIX`.

I've added the generated dialects in a new `dialect_gen` module for now,
such that they can be compared with the original hand-written bindings
in the `dialect` module.

There are still a few issues:

- [ ] Some parts of the code are hacky and ugly, and may be hard to
read.
- [ ] Type inference is not always detected (but it should be as good as
the Python bindings at least)
- [ ] Need to add tests (already have them in a separate repository)
- [ ] Need to fix some issues with the CI

I wanted complete parity with the existing hand-written dialect
bindings, but there are some things that aren't generated as nicely. For
example, `arith::CmpiPredicate` is not generated and plain `Attribute`
is used instead for `arith::cmpi`. It might be feasible to generate
dialect specific attributes from ODS instead. Or perhaps being able to
write some function manually would be useful.

Currently I generate wrapper types around `Operation` that provide
additional methods, for example:

```rust
        pub struct AddIOp<'c> {
            operation: ::melior::ir::operation::Operation<'c>,
        }
        impl<'c> AddIOp<'c> {
            pub fn name() -> &'static str {
                "arith.addi"
            }
            pub fn operation(&self) -> &::melior::ir::operation::Operation<'c> {
                &self.operation
            }
            pub fn builder(
                location: ::melior::ir::Location<'c>,
            ) -> AddIOpBuilder<'c, AddIOp__No__Lhs, AddIOp__No__Rhs> {
                AddIOpBuilder::new(location)
            }
            pub fn result(&self) -> ::melior::ir::operation::OperationResult<'c, '_> {
                self.operation.result(0usize).expect("operation should have this result")
            }
            pub fn lhs(&self) -> ::melior::ir::Value<'c, '_> {
                self.operation
                    .operand(0usize)
                    .expect("operation should have this operand")
            }
            pub fn rhs(&self) -> ::melior::ir::Value<'c, '_> {
                self.operation
                    .operand(1usize)
                    .expect("operation should have this operand")
            }
        }
```

I then provide implementations of `Into<Operation>` and
`TryFrom<Operation>` to "cast" to and from an `Operation`.

```rust
        impl<'c> TryFrom<::melior::ir::operation::Operation<'c>> for AddIOp<'c> {
            type Error = ::melior::Error;
            fn try_from(
                operation: ::melior::ir::operation::Operation<'c>,
            ) -> Result<Self, Self::Error> {
                Ok(Self { operation })
            }
        }
        impl<'c> Into<::melior::ir::operation::Operation<'c>> for AddIOp<'c> {
            fn into(self) -> ::melior::ir::operation::Operation<'c> {
                self.operation
            }
        }
```

I wonder if it would be better to use an `OperationLike` trait, similar
to `AttributeLike`? That way we wouldn't have to call `operation` or
`into` to use the methods on `Operation`.

On a related note: should we also generate `Ref` (and `RefMut`?) types
for these operation wrappers? It might be useful to be able to cast a
`OperationRef` into a `AddIOpRef`, for example, to more easily analyze
operations in external passes (or even external analyses, which is
something else I'm working on as well:
[mlir-rust-tools](https://gitlab.com/Danacus/mlir-rust-tools)).
@raviqqe
Copy link
Member

raviqqe commented Aug 17, 2023

I've added the "Remaining work" section in the issue description. Feel free to edit it!

@Danacus
Copy link
Contributor Author

Danacus commented Aug 26, 2023

I think generation of unit tests might be difficult, because it's hard to find and construct a valid type for a certain operation. For example, to test arith.addi, we need two Values, but we can't easily tell what type these values need to be.

For example: using type f32 for arith.addi will cause Operation::verify to fail with error 'arith.addi' op operand #0 must be signless-integer-like, but got 'f32'. The problem is that I don't really know if there is an easy way to get a create a type that satisfies some arbitrary type constraint such as signless-integer-like.

That said, it would still be possible to test using the wrong type if we just don't verify the operations. For example, we could use i32 as type for all operations in all tests, even though this is usually not valid.

We could also stick to a few handwritten tests to test functionality such as variadic operands and results. What do you think?

@raviqqe
Copy link
Member

raviqqe commented Aug 27, 2023

We could also stick to a few handwritten tests to test functionality such as variadic operands and results. What do you think?

Sounds good. We can choose some representative operations to test then.

@raviqqe
Copy link
Member

raviqqe commented Aug 31, 2023

I think the code quality in the dialect module in the melior-macro crate is good enough to prioritize other features although we can still improve them over time! 😄 I'm releasing the next minor version now.

@raviqqe raviqqe closed this as completed Feb 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants