Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Java JLS 13 and JLS14 preview in M3 models and Java ASTs; also make ASTs jump through the correctness specification in lang::analysis::AST #1936

Merged
merged 138 commits into from
May 9, 2024

Conversation

jurgenvinju
Copy link
Member

@jurgenvinju jurgenvinju commented Apr 16, 2024

This branch starts from the main branch where Java support for M3 and ASTs was done up to level 8 of the Java Language Standard (JLS8).

However several key concepts were missing from the syntax trees namely type parameters and modifiers and annotations here and there. Some of these concepts were represented only in the M3 database, and at the time it was decided that double representation was not good for memory consumption. Today and with experience from the C++/C M3 models, PHP and Ada, this design decision has shifted: every AST must be fully informative, and the AST model must be aligned with the M3 model by sharing src (physical location), decl (logical location) and typ resolved static types keyword parameters on representative AST nodes. The full specification of a correct AST nodes is a boolean predicate in lang::analysis::M3.

  • added AST nodes for all new Java features up to JLS13, including the Java 9 module system
  • added AST nodes for the new Java features in the preview for JLS14
  • added M3 support for source and binary models of the Java 9 module system
  • added back modifiers AST nodes everywhere
  • made sure annotations and modifers that can be typed in different orders for methods and variabele and parameter declarations end up in the AST in the order of appearance in the source code
  • added back type parameter AST nodes everywhere
  • added AST nodes for simple and qualified names everywhere (for concrete syntax support)
  • tested new AST nodes and M3 on vallang, rascal and jsoup
  • validated all AST nodes on the AST correctness specification in analysis::m3::AST and fixed all the bugs
  • documented all M3 relations
  • documented the logical location scheme's of Java M3 and Java AST.
  • resolve builtin array .length field accesses to java+arrayLength:///TypeName[] to fix a number of inconsistencies around these hidden fields.

…tem, in particular the AST nodes that may occur in module-info.java files. Still to be tested
…ng as only set and relation fields are being used
…parameters to a node without having to repeat the existing ones
…are still str that should be Expression in my mind
…rmat. Also documented analysis::m3::AST better
…Java\'s AST format, and fixed the mapping to accomodate
@jurgenvinju jurgenvinju requested a review from DavyLandman May 8, 2024 09:58
@jurgenvinju
Copy link
Member Author

I think this is ready for review, but I should want that the number of changes is quite enormous.

  • I have compared the new ASTs to the old ASTs and the new M3 models to the old M3 models meticulously (tuple by tuple, node by node), until I had fixed all new bugs in the AST and M3 builders, and confirmed that existing bugs have been solved.
  • The changes in the definition of ASTs reflect what users have to do to get up-to-date, M3 models have hardly changed.
  • Some TODO's are left, such as adding implicitDeclarations to the M3 model, and including lambda's in a way in the M3 model such that a complete call graph construction algorithm is possible with only M3 as source data. However, that would add more noise to the current PR so I'm pushing that to the future.

@jurgenvinju
Copy link
Member Author

Am still working to run the astNodeSpecification checker over all the ASTs. Errors keep popping up. Those are going to be minor changes of list.insert to list.append and such.

@jurgenvinju
Copy link
Member Author

Ok all the Java ASTs now pass all of the AST specification. The M3 specification (made based on the C++ support) is not robust enough yet to be able to be used as a quality gate.

Copy link
Member

@DavyLandman DavyLandman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a huge refactoring, the java was unreviewable so I:

  • reviewed the rascal change, they look much improved, with 1 note about defaulting to java 13
  • looked at a few things in java that would worry me, but all is good there.
  • ran it on my machine to test it would still work on windows (as there is some path stuff in here as well).

Looks good to me, great improvement.

@@ -87,6 +87,10 @@ public static int readIntTag(AbstractFunction test, String key, int defaultVal)

private void runTests(ModuleEnvironment env, List<AbstractFunction> tests) {
testResultListener.start(env.getName(), tests.size());
if (tests.size() <= 0) {
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this also have a testResultListener.done(); call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely! good catch @DavyLandman . I removed the superfluous call to testResultListener.start instead. This will help a lot with the reporting in the IDE and github actions.

| `src` | the exact source location of the declaration in a source file |
| `decl` | the resolved fully qualified name of the artefact that is being declared here |
| `typ` | a symbolic representation of the static type of the declared artefact here (not the syntax of the type) |
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks for also writing all this documentation.

@memo
M3 composeM3(loc id, set[M3] models) {
M3 comp = m3(id);
map[M3, map[str,value]] fields = (m:getKeywordParameters(m) | m <- models);
set[str] keys = {*domain(getKeywordParameters(m)) | m <- models };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could be written as a comprehension over fields to avoid extra calls to getKeywordParameters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current solution is faster, contra-intuitively. getKeywordParameters does not require a lookup in the hash trie, since this map is attached directly to the constructor. Especially for large sets of models this choice can save a few instructions per model.

@@ -199,7 +361,7 @@ set[loc] findRoots(set[loc] folders) {
@description{
Wrapper around ((createAstsFromFiles)) to call it on a single file.
}
public Declaration createAstFromFile(loc file, bool collectBindings, bool errorRecovery = false, list[loc] sourcePath = [], list[loc] classPath = [], str javaVersion = "1.7") {
public Declaration createAstFromFile(loc file, bool collectBindings, bool errorRecovery = false, list[loc] sourcePath = [], list[loc] classPath = [], Language javaVersion = JLS13()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this enum 👍 but why default to java 13? I would expect one of the LTS versions like 8/11/17/21. And what happens if in the future we also add support for newer versions, will the default change?

(same for the others in this file)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure the porting to newer versions is always as small as possible, I would advice users to always go with the highest support we currently have, if possible. That way when we release support for newer versions, upgrading will always be between the lowest number of versions.

So yes, the default will always be the highest we have.

If there are other considerations, I'd be happy to listen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of backwards compatibility, if I have code that's not setting this language level (because of the default, I'm not forced to commit to a specific version). Hypothetical say we release this as 11. And I'm analyzing code with it that has some variables named 'yield' in there. Now we do some new work, and add Java 17 supported, and we switch the default to java 17. For that new version of rascal, my analysis failed, without me changing my code. (Since yield is now a keyword)

So either we freeze the default, or we make it an required parameter, and have a special 'highestSupported' kind of ADT member, for people that just want the lastest and greatest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the argument but I don't agree setting an old version and fix it is a good solution for most cases. It is for the yield example; but it would mean we now have to lock on Java 8, as this was the version we have supported up until now (and partially at that). Most students would have to set the version parameter to 12 or 13.

  • if people need backwards compatibility for their actual compiler, they must set the version level explicitly, otherwise all compilers always default to their current/latest version. So to get this 100% right, we probably should add some code to the Maven support to extract that configuration parameter from the XML. But otherwise I think it is safe enough to follow the normal Java compiler behavior.
  • highestSupported constructor is not right since M3 models need to be self-descriptive. The goal of the version parameter is not only to configure the compiler the right way, but also to log the chosen version as it is. The Language ADT is a specific ADT to encode language names and their versions from analysis::m3::Core.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about making it a positional argument?

And my 'latestVersion' would just be meant as a parameter, not as a valid value in an m3 model.

catch (URISyntaxException e) {
throw new RuntimeException("Extending path error with URIUtil", e);
}
return URIUtil.getChildLocation(uri, path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


public loc unpackExampleProject(str name, loc projectZip) {
@synopsis{Unpack a test source zip in a temporary folder for downstream analysis.}
private loc unpackExampleProject(str name, loc projectZip) {
targetRoot = |tmp:///<name>|;
sourceRoot = projectZip[scheme = "jar+<projectZip.scheme>"][path = projectZip.path + "!/"];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we can actually use zip+ now instead of jar+? At the time zip might not have been available?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was also thinking that :-) But this private function is used for both unpacking jars and zips, and the jar unzipper does something special with the META-INF folder.

Copy link

codecov bot commented May 8, 2024

Codecov Report

Attention: Patch coverage is 56.46186% with 411 lines in your changes are missing coverage. Please review.

Project coverage is 49%. Comparing base (6bb94a2) to head (efa6401).

Files Patch % Lines
...pl/library/lang/java/m3/internal/ASTConverter.java 58% 198 Missing and 25 partials ⚠️
...pl/library/lang/java/m3/internal/JarConverter.java 19% 52 Missing and 3 partials ⚠️
...library/lang/java/m3/internal/SourceConverter.java 32% 53 Missing and 2 partials ⚠️
...ary/lang/java/m3/internal/EclipseJavaCompiler.java 54% 24 Missing and 5 partials ⚠️
...ibrary/lang/java/m3/internal/BindingsResolver.java 66% 9 Missing and 8 partials ⚠️
...y/lang/java/m3/internal/JavaToRascalConverter.java 67% 8 Missing and 6 partials ⚠️
...library/lang/java/m3/internal/ASMNodeResolver.java 89% 4 Missing and 3 partials ⚠️
...mpl/library/lang/java/m3/internal/M3Converter.java 75% 6 Missing ⚠️
src/org/rascalmpl/library/Prelude.java 0% 3 Missing ⚠️
...org/rascalmpl/repl/TerminalProgressBarMonitor.java 0% 1 Missing ⚠️
... and 1 more
Additional details and impacted files
@@           Coverage Diff            @@
##              main   #1936    +/-   ##
========================================
- Coverage       49%     49%    -1%     
- Complexity    6193    6215    +22     
========================================
  Files          663     662     -1     
  Lines        58946   59432   +486     
  Branches      8575    8637    +62     
========================================
+ Hits         29026   29178   +152     
- Misses       27724   28018   +294     
- Partials      2196    2236    +40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jurgenvinju
Copy link
Member Author

Thanks @DavyLandman As soon as the tests run on the server, I'll merge.

@jurgenvinju
Copy link
Member Author

Aha! The type-checker is now forcing us to update JavaToObjectFlow :-) good catch.

@jurgenvinju jurgenvinju merged commit 2690e1f into main May 9, 2024
7 checks passed
@jurgenvinju jurgenvinju deleted the m3-jls-13-and-higher branch May 9, 2024 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants