Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support storing the code that builds the code model #305

Open
wants to merge 15 commits into
base: code-reflection
Choose a base branch
from

Conversation

mabbay
Copy link
Member

@mabbay mabbay commented Jan 22, 2025

In this PR we allow the user to decide how to store the code model. The first option is TEXT, if this is selected, we store the textual representation of the code model. The second option is CODE_BUILDR, if this is selected, we store the code that builds the code model. All work done here is around the second option, because the first option is already supported.


Progress

  • Change must not contain extraneous whitespace

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/babylon.git pull/305/head:pull/305
$ git checkout pull/305

Update a local copy of the PR:
$ git checkout pull/305
$ git pull https://git.openjdk.org/babylon.git pull/305/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 305

View PR using the GUI difftool:
$ git pr show -t 305

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/babylon/pull/305.diff

Using Webrev

Link to Webrev Comment

@mabbay mabbay self-assigned this Jan 22, 2025
@bridgekeeper
Copy link

bridgekeeper bot commented Jan 22, 2025

👋 Welcome back mabbay! A progress list of the required criteria for merging this PR into code-reflection will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jan 22, 2025

@mabbay This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

Support storing the code that builds the code model

Reviewed-by: mcimadamore

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 5 new commits pushed to the code-reflection branch:

  • 74b1465: Adjusted tests to use execution of a method and implementation of the execution model cache
  • cd265d2: Short cut for single method execution
  • 80934b2: Onnx subgraphs, lambda execution and BB removal
  • 5a30fac: MNISTDemo UI tweak
  • 4752b37: Split OnnxRuntime into high-level and low-level generated code

Please see this link for an up-to-date comparison between the source branch of this pull request and the code-reflection branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the code-reflection branch, type /integrate in a new comment.

@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 22, 2025
@mlbridge
Copy link

mlbridge bot commented Jan 22, 2025

@mabbay
Copy link
Member Author

mabbay commented Feb 10, 2025

To test this you can compile the program TestStoringCodeModelBuilder.java with the following configuration:

  • Program: com.sun.tools.javac.Main

  • Arguments: -XDdumpIR -XDcodeModelStorageOption=CODE_BUILDER src/jdk.incubator.code/share/classes/jdk/incubator/code/internal/TestStoringCodeModelBuilder.java

The compilation will fail with the error: no enclosing instance of type TypeElementFactory is in scope

@mabbay
Copy link
Member Author

mabbay commented Feb 23, 2025

I made CODE_BUILDER the default storage mechanism for code models, so you can test the capability by running the compiler tests.

@PaulSandoz
Copy link
Member

Getting compiler errors due to lack of imports:

=== Output from failing command(s) repeated here ===

  • For target jdk_modules_jdk.incubator.code__the.jdk.incubator.code_batch:
    /Users/sandoz/Projects/jdk/babylon/src/jdk.incubator.code/share/classes/jdk/incubator/code/Op.java:540: error: cannot find symbol
    opMethod = method.getDeclaringClass().getDeclaredMethod(opMethodName, OpFactory.class,
    ^
    symbol: class OpFactory
    location: class Op
    /Users/sandoz/Projects/jdk/babylon/src/jdk.incubator.code/share/classes/jdk/incubator/code/Op.java:542: error: cannot find symbol
    args = new Object[] {ExtendedOp.FACTORY, CoreTypeFactory.CORE_TYPE_FACTORY};
    ^
    symbol: variable CoreTypeFactory
    location: class Op
    2 errors

After fixing those locally i get many test failures.

@PaulSandoz
Copy link
Member

@mcimadamore Mourad implemented a transformation of the code model that builds a code model that adds local variables for values with more than one use, which makes it easier to generate the AST nodes. Would use of the internal LetExpr help avoid such a transformation, if so we can consider that for follow on work.

@mcimadamore
Copy link
Collaborator

@mcimadamore Mourad implemented a transformation of the code model that builds a code model that adds local variables for values with more than one use, which makes it easier to generate the AST nodes. Would use of the internal LetExpr help avoid such a transformation, if so we can consider that for follow on work.

I'm not sure I have all the context here. The problem here seems to be when you have a value that is resulting from some potentially side-effect operation. E.g. like a method call:

%2 = bar(%1)

If %2 is used multiple times, then javac has only one option -- that is, to hoist %2 in a local variable, and then replaces all references to %2 with references to the local variable. Inlining the call to bar at the use-site is not really an option, as that could change the semantics of the program.

Let expression nodes are useful when dealing with compact expressions. E.g.

List l = let x = 42 in List.of(42)

E.g. javac typically uses a let expression when it has to translate a single expression into something more complex, but it wants to do so by keeping the result as an expression (rather than turning the expression into a statement, which is not possible in all cases, such as in the case of a variable initializer).

It is true that what seems like a linear list of ops in a block can be modelled as something like more convoluted, like so:

let op1 = <op1 init> in
    let op2 = <op2 init> in
        let op3 = <op2 init> in
               ....
               <result>

This would mean to generate one let expression per op, where the "body" of the let expression is the remainder of the code model block. All this nesting is confusing, but is also avoidable -- a LetExpr node allows for more than one declaration for each body -- so you can translate the above as follows:

let (op1 = <op1 init> ;
     op2 = <op2 init> ;
     op3 = <op2 init>) in <result>

Doing something like this would probably avoid the need of generating extra local variables -- you now have one var declaration per op in the "statements" part of a LetExpr. It looks a bit odd -- visually -- that the body of the LetExpr is just the result of the code model block -- e.g. all the interesting part is in the setup code. But seems doable.

At the end of the day either adding extra variables (which can even be done as a pre-processing step, by javac), or using a more functional translation with LetExpr should work.

P.S.
I looked at the code and, at least in some cases (h) the TestAddVarsWhenNecessary seems to add intermediate Var ops, but which seem redundant - as they are initialized with some function parameter. I'd expect javac to be able to deal with references to function parameters using a JCIdent pointing at the desired parameter.


private Map<JavaType, Type> mappingFromJavaTypeToType() {
Map<JavaType, Type> m = new HashMap<>();
Symbol.ModuleSymbol jdk_incubator_code = syms.enterModule(names.jdk_incubator_code);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is effectively an extension to javac's symbol table. My preference would be to:

  • add whatever symbol/type you need in CodeReflectionSymbol
  • then set up the translation map simply, like you do for primitive types -- e.g.
Map.entry(<type element>, <javac type>)

};
}

private JCTree invokeOpToJCMethodInvocation(CoreOp.InvokeOp invokeOp) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method has some issues -- but perhaps we can keep it for now.
The main problem is that we seem to copy types from the invoked method declaration and put them "as is" in the generated AST. This works only as long as the invoked method doesn't have any generic type. For instance, consider a call to List::add. The signature of this method accepts X which is a type parameter of List<X>. I believe the code you have copies the X from the declared signature and sticks it into the javac's MethodType instance attached to the method call. Since this method type is used to describe the use-site (the call), and not the declaration, the type seems incorrect. E.g. calling List<String>:add should have a type of add(String) not add(X). (btw, a similar problem is present for field access, if the type of the accessed field is a type-variable -- so it's not just an issue with method calls).

The type of the invoked method/accessed field should always be adjusted with Types.memberType first. This method takes two parameters:

  • the receiver type you are using to call the instance method/access the instance field
  • the symbol you are trying to access

And returns the instantiated type of that symbol at that receiver type. Examples:

memberType(List<String>, add(X)->void) -> (String)->void
memberType(List<Integer>, add(X)->void) -> (Integer)->void

Once you address this, there is still another problem with generic methods -- as methods can have their own type-variables too. To figure out what is the type to be replaced for these type-variables you generally need to run inference -- which seems way too much for what we're trying to do here. The issue here is that the code model you are processing doesn't expose these details, and so generating the AST needs to "trace back" whatever steps where done when generating this model -- which is an hard problem.

I hope we can get away with just working with erased types, and maybe insert type-conversion to convert the type of the invoked method/accessed field so that it matches the expected op result type.

if (invokeOp.isVarArgs()) {
var lastParam = invokeOp.invokeDescriptor().type().parameterTypes().getLast();
Assert.check(lastParam instanceof ArrayType);
methodInvocation.varargsElement = typeElementToType(((ArrayType) lastParam).componentType());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem as described above -- here we're copying the vararg array at the declaration site into a type at the use-site -- for something like List::of this won't give the result you expect.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now -- while all the points I made above apply (e.g. if you were to try and type-check the generated AST using Attr you will get several errors), the saving grace here is that you are sending this tree into javac's backend anyway. And the backend plays a lot looser with types, only inserting casts where absolutely needed. Since you are using the op result type on the MethodType instance you are generating, I believe that should be enough for the backend to at least insert 90% of the type conversions that might be required because of erasure. So, in practice, even if incorrect, the code above might work fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready Pull request is ready to be integrated rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

3 participants