-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up Refaster
bug checker
#261
base: master
Are you sure you want to change the base?
Conversation
// XXX: Strictly speaking this is wrong: these paths _could_ exist. | ||
// XXX: Implement something better. | ||
if (!unorderedEdgesToLeaf.isEmpty()) { | ||
// assertThat(isReachable(tree, leaf, randomStrictSubset(unorderedEdgesToLeaf, random))) | ||
// .isFalse(); | ||
// assertThat( | ||
// isReachable( | ||
// tree, | ||
// leaf, | ||
// insertValue( | ||
// randomStrictSubset(unorderedEdgesToLeaf, random), unknownEdge, random))) | ||
// .isFalse(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some ideas on how we might be able to do this differently, but will need to find some time to play with the code.
Just a drive-by comment to mention what a massive milestone this is @Stephan202 @rickie 💪 💪 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have one tiny suggestion for renaming. Will stop reviewing to prioritize the Refaster annotation PR first.
@MethodSource("verifyTestCases") | ||
@ParameterizedTest | ||
void verify(ImmutableSetMultimap<Integer, ImmutableSet<String>> source, Random random) { | ||
Node<Integer> tree = Node.create(source.keySet().asList(), source::get); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a suggestion, as it is the input / content for the tree.
Other option could be: treeInputSource
or treeInputContent
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe treeInputSource
indeed. Will sleep on it 😄
* @return A string representation of the operator, if known | ||
* @throws IllegalArgumentException If the given input is not supported. | ||
*/ | ||
// XXX: Extend list to cover remaining cases; at least for any `Kind` that may appear in a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already did this though. To be double-checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, we should eventually perhaps do something similar for certain keywords (instanceof
, new
, ...), and possibly some other things. Might be best to do that once we have some benchmarking in place.
|
||
private static final class RefasterIntrospection { | ||
private static final String UCLASS_IDENT_FQCN = "com.google.errorprone.refaster.UClassIdent"; | ||
// XXX: Probably there is a better way to fix this... For a few BeforeTemplates like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe good enough for now, can improve later.
6f0529f
to
67127ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased.
pom.xml
Outdated
<threads>4</threads> | ||
<timeoutFactor>4</timeoutFactor> | ||
<timestampedReports>false</timestampedReports> | ||
<excludedClasses> | ||
<excludedClass>*.AutoValue_*</excludedClass> | ||
</excludedClasses> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will port this to #255 and move the excludedClasses
tag before the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to drop this now.
@MethodSource("verifyTestCases") | ||
@ParameterizedTest | ||
void verify(ImmutableSetMultimap<Integer, ImmutableSet<String>> source, Random random) { | ||
Node<Integer> tree = Node.create(source.keySet().asList(), source::get); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe treeInputSource
indeed. Will sleep on it 😄
* @return A string representation of the operator, if known | ||
* @throws IllegalArgumentException If the given input is not supported. | ||
*/ | ||
// XXX: Extend list to cover remaining cases; at least for any `Kind` that may appear in a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, we should eventually perhaps do something similar for certain keywords (instanceof
, new
, ...), and possibly some other things. Might be best to do that once we have some benchmarking in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial suggestions. Only looked at Node
and Refaster
will continue next week. Also pushed a commit, PTAL.
if (children().size() < candidateEdges.size()) { | ||
for (Map.Entry<String, Node<T>> e : children().entrySet()) { | ||
if (candidateEdges.contains(e.getKey())) { | ||
e.getValue().collectReachableValues(candidateEdges, sink); | ||
} | ||
} | ||
} else { | ||
ImmutableList<String> remainingCandidateEdges = | ||
candidateEdges.subList(1, candidateEdges.size()); | ||
Node<T> child = children().get(candidateEdges.get(0)); | ||
if (child != null) { | ||
child.collectReachableValues(remainingCandidateEdges, sink); | ||
} | ||
collectReachableValues(remainingCandidateEdges, sink); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something this can be done without recursion which is easier to understand:
if (children().size() < candidateEdges.size()) { | |
for (Map.Entry<String, Node<T>> e : children().entrySet()) { | |
if (candidateEdges.contains(e.getKey())) { | |
e.getValue().collectReachableValues(candidateEdges, sink); | |
} | |
} | |
} else { | |
ImmutableList<String> remainingCandidateEdges = | |
candidateEdges.subList(1, candidateEdges.size()); | |
Node<T> child = children().get(candidateEdges.get(0)); | |
if (child != null) { | |
child.collectReachableValues(remainingCandidateEdges, sink); | |
} | |
collectReachableValues(remainingCandidateEdges, sink); | |
} | |
int edgesSize = candidateEdges.size(); | |
for (int i = 0; i < edgesSize; i++) { | |
String edge = candidateEdges.get(i); | |
if (children().containsKey(edge)) { | |
children().get(edge).collectReachableValues(candidateEdges.subList(i + 1, edgesSize), sink); | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, two things 😄
- It's true that functionally we don't need both branches, but both are defined for performance reasons. (See the internal Store
ConfigurationSelector
code and associated benchmark for a case where this matters.) Basically, if there are fewer children than candidate edges, then taking the first branch can be beneficial. But: I see that in the internal codecandidateEdges
is the result ofImmutableSortedSet#asList()
, such that thecontains
check is O(1), while here it's O(n). So that's a bug to be fixed 👍. (Whether then theif
-branch is worthwhile to keep depends on the actual data indexed and queried; we'll need a realistic benchmark for that.) - But for the
else
branch: awesome suggestion! 💪 I'll also backport this to the internal code. (There it seems to have a small positive impact on the benchmark, though it seems to also increase the standard deviation a bit. But perhaps that's noise due to other stuff running on my laptop.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that in the internal code candidateEdges is the result of ImmutableSortedSet#asList(), such that the contains check is O(1), while here it's O(n).
Ah, in that case, it makes more sense 😄 I will have a closer look at the benchmark. 👍
|
||
@AutoValue | ||
@SuppressWarnings("AutoValueImmutableFields" /* Type is used only during `Node` construction. */) | ||
abstract static class BuildNode<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a regular builder pattern, lets's stick to the standard naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it's a mutable node, so conceptually more than a builder. I think it's nice to reflect that? Also, having a builder recursively nest other builders feels a bit awkward 🤔 (Note that this type is not part of the API.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far the only reason to have this class is to build a tree, the fact that it's mutable is not used elsewhere.
Also, having a builder recursively nest other builders feels a bit awkward 🤔
Why? 🤔 It models recursive structure, so I don't see a big deal with this. I think it should be possible to model it differently to avoid nested builders, but I'm not sure how much we will gain by this 🤷♂️
(Note that this type is not part of the API.)
Still, I believe re-naming it to Builder
and immutable()
to build()
will make it a bit clearer, primarily because it's not used as a mutable node. If the intent here is to have a mutable node, then I would expect it to have the same interface and be interchangeable with the immutable counterpart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, will rename 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @ibabiankou!
Also pushed a commit, PTAL.
Didn't see it 👀 (But I did just push something 😄 .)
|
||
@AutoValue | ||
@SuppressWarnings("AutoValueImmutableFields" /* Type is used only during `Node` construction. */) | ||
abstract static class BuildNode<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it's a mutable node, so conceptually more than a builder. I think it's nice to reflect that? Also, having a builder recursively nest other builders feels a bit awkward 🤔 (Note that this type is not part of the API.)
if (children().size() < candidateEdges.size()) { | ||
for (Map.Entry<String, Node<T>> e : children().entrySet()) { | ||
if (candidateEdges.contains(e.getKey())) { | ||
e.getValue().collectReachableValues(candidateEdges, sink); | ||
} | ||
} | ||
} else { | ||
ImmutableList<String> remainingCandidateEdges = | ||
candidateEdges.subList(1, candidateEdges.size()); | ||
Node<T> child = children().get(candidateEdges.get(0)); | ||
if (child != null) { | ||
child.collectReachableValues(remainingCandidateEdges, sink); | ||
} | ||
collectReachableValues(remainingCandidateEdges, sink); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, two things 😄
- It's true that functionally we don't need both branches, but both are defined for performance reasons. (See the internal Store
ConfigurationSelector
code and associated benchmark for a case where this matters.) Basically, if there are fewer children than candidate edges, then taking the first branch can be beneficial. But: I see that in the internal codecandidateEdges
is the result ofImmutableSortedSet#asList()
, such that thecontains
check is O(1), while here it's O(n). So that's a bug to be fixed 👍. (Whether then theif
-branch is worthwhile to keep depends on the actual data indexed and queried; we'll need a realistic benchmark for that.) - But for the
else
branch: awesome suggestion! 💪 I'll also backport this to the internal code. (There it seems to have a small positive impact on the benchmark, though it seems to also increase the standard deviation a bit. But perhaps that's noise due to other stuff running on my laptop.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments and suggestions.
private static void collectRefasterRules( | ||
ImmutableCollection<CodeTransformer> transformers, Consumer<RefasterRule<?, ?>> sink) { | ||
for (CodeTransformer t : transformers) { | ||
collectRefasterRules(t, sink); | ||
} | ||
} | ||
|
||
private static void collectRefasterRules( | ||
CodeTransformer transformer, Consumer<RefasterRule<?, ?>> sink) { | ||
if (transformer instanceof RefasterRule) { | ||
sink.accept((RefasterRule<?, ?>) transformer); | ||
} else if (transformer instanceof CompositeCodeTransformer) { | ||
collectRefasterRules(((CompositeCodeTransformer) transformer).transformers(), sink); | ||
} else { | ||
throw new IllegalStateException( | ||
String.format("Can't handle `CodeTransformer` of type '%s'", transformer.getClass())); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit out of place, I think it belongs in Refaster
. WDYT?
List<Set<String>> identifierCombinations = new ArrayList<>(); | ||
identifierCombinations.add(new HashSet<>()); | ||
|
||
new TemplateIdentifierExtractor().scan(trees, identifierCombinations); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below, extractors seem to be stateless, so we could use a singleton here and below.
private static ImmutableSet<ImmutableSet<String>> extractTemplateIdentifiers( | ||
ImmutableList<? extends Tree> trees) { | ||
List<Set<String>> identifierCombinations = new ArrayList<>(); | ||
identifierCombinations.add(new HashSet<>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to add an empty set here manually? I think it is worth adding a comment.
* <li>Traverse the tree based on the identifiers from the {@link CompilationUnitTree}. Every node | ||
* can contain Refaster templates. Once a node is we found a candidate Refaster template and | ||
* will therefore be added to the list of candidates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* <li>Traverse the tree based on the identifiers from the {@link CompilationUnitTree}. Every node | |
* can contain Refaster templates. Once a node is we found a candidate Refaster template and | |
* will therefore be added to the list of candidates. | |
* <li>Traverse the tree based on the identifiers from the {@link CompilationUnitTree}. Every node | |
* can contain Refaster templates. Once a node is we found a candidate Refaster template that | |
* might match some code and will therefore be added to the list of candidates. |
* <p>The tree is traversed based on the identifiers in the {@link CompilationUnitTree}. When a leaf | ||
* contains a template and is reached, we can be certain that the identifiers from the {@link |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* <p>The tree is traversed based on the identifiers in the {@link CompilationUnitTree}. When a leaf | |
* contains a template and is reached, we can be certain that the identifiers from the {@link | |
* <p>The tree is traversed based on the identifiers in the {@link CompilationUnitTree}. When a node | |
* containing a template is reached, we can be certain that the identifiers from the {@link |
* contains a template and is reached, we can be certain that the identifiers from the {@link | ||
* BeforeTemplate} are at least present in the {@link CompilationUnitTree}. | ||
* | ||
* <p>Since the identifiers are sorted, we can prune parts of the {@link Node tree} while we are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prune implies modification of the tree.
* <p>Since the identifiers are sorted, we can prune parts of the {@link Node tree} while we are | |
* <p>Since the identifiers are sorted, we can skip parts of the {@link Node tree} while we are |
* {@link CompilationUnitTree} we now only return a subset of the templates that at least have a | ||
* chance of matching. As a result, the performance of Refaster significantly increases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* {@link CompilationUnitTree} we now only return a subset of the templates that at least have a | |
* chance of matching. As a result, the performance of Refaster significantly increases. | |
* {@link CompilationUnitTree} we now only matching a subset of the templates that at least have a | |
* chance of matching. As a result, the performance of Refaster increases significantly. |
getMethod(UCLASS_IDENT, "getTopLevelClass"); | ||
private static final Method METHOD_UANY_OF_EXPRESSIONS = getMethod(UAnyOf.class, "expressions"); | ||
|
||
static boolean isUClassIdent(IdentifierTree tree) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the full word.
static boolean isUClassIdent(IdentifierTree tree) { | |
static boolean isUClassIdentifier(IdentifierTree tree) { |
} | ||
|
||
// Actually UClassIdent. | ||
static IdentifierTree getClassIdent(UStaticIdent tree) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static IdentifierTree getClassIdent(UStaticIdent tree) { | |
static IdentifierTree getClassIdentifier(UStaticIdent tree) { |
|
||
private static void registerOperator( | ||
ExpressionTree node, List<Set<String>> identifierCombinations) { | ||
identifierCombinations.forEach(ids -> ids.add(treeKindToString(node.getKind()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The identifier does not change.
identifierCombinations.forEach(ids -> ids.add(treeKindToString(node.getKind()))); | |
String id = treeKindToString(node.getKind()); | |
identifierCombinations.forEach(ids -> ids.add(id)); |
To be used for custom links, custom error messages, custom other stuff...
…' into sschroevers/refaster-speed-up
…' into sschroevers/refaster-speed-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged the changes of #255 into this PR. Will change the base branch.
private static void collectRefasterRules( | ||
ImmutableCollection<CodeTransformer> transformers, Consumer<RefasterRule<?, ?>> sink) { | ||
for (CodeTransformer t : transformers) { | ||
collectRefasterRules(t, sink); | ||
} | ||
} | ||
|
||
private static void collectRefasterRules( | ||
CodeTransformer transformer, Consumer<RefasterRule<?, ?>> sink) { | ||
if (transformer instanceof RefasterRule) { | ||
sink.accept((RefasterRule<?, ?>) transformer); | ||
} else if (transformer instanceof CompositeCodeTransformer) { | ||
collectRefasterRules(((CompositeCodeTransformer) transformer).transformers(), sink); | ||
} else { | ||
throw new IllegalStateException( | ||
String.format("Can't handle `CodeTransformer` of type '%s'", transformer.getClass())); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ibabiankou I agree that with the PR as-is it was an improvement to move this RefasterRule
extraction logic here, but the changes in #255 make the CodeTransformer
-> ImmutableSet<ImmutableSet<String>>
transformation less straightforward, so I'll move it back. Of course, idea on how to better do this are welcome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a commit. More review TBD.
pom.xml
Outdated
<threads>4</threads> | ||
<timeoutFactor>4</timeoutFactor> | ||
<timestampedReports>false</timestampedReports> | ||
<excludedClasses> | ||
<excludedClass>*.AutoValue_*</excludedClass> | ||
</excludedClasses> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to drop this now.
|
||
@AutoValue | ||
@SuppressWarnings("AutoValueImmutableFields" /* Type is used only during `Node` construction. */) | ||
abstract static class BuildNode<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, will rename 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small comment. Code improvements are cool 🚀 !
@@ -305,7 +300,7 @@ private static final class RefasterIntrospection { | |||
private static final Method METHOD_UANY_OF_EXPRESSIONS = getMethod(UAnyOf.class, "expressions"); | |||
|
|||
static boolean isUClassIdent(IdentifierTree tree) { | |||
return UCLASS_IDENT.equals(tree.getClass()) || UCLASS_AUTOVALUE_IDENT.equals(tree.getClass()); | |||
return UCLASS_IDENT.isAssignableFrom(tree.getClass()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one @Stephan202, this is way better haha.
static IdentifierTree getClassIdent(UStaticIdent tree) { | ||
return invokeMethod(METHOD_USTATIC_IDENT_CLASS_IDENT, tree); | ||
} | ||
|
||
// XXX: Make nicer. Or rename the other params. | ||
// Arguments to this method must actually be of the package-private type `UClassIdent`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/to/of
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One passes arguments to a method. (Corresponding to the parameters of said method. But here we are talking about the provided argument; the comment is here precisely because the parameter type isn't specific enough.)
3ee2c4b
to
2837464
Compare
2837464
to
ea33a0b
Compare
Quality Gate passedThe SonarCloud Quality Gate passed, but some issues were introduced. 6 New issues |
Mutation testing report by Pitest. Review any surviving mutants by inspecting the line comments under Files changed. |
Resolved a whole bunch of conflicts. I did need to suppress some |
❗
This PR is on top of #255❗This PR is a work in progress; @rickie and I have worked on this on-and-off over the past months. Pulling this change across the finish line is a requirement for enabling the
Refaster
bug checker by default in our internal code base.Open points include general code review, resolving various comments, and increasing test coverage.
This PR will also require changes in the context of #255, as that PR introduces a new type of
CodeTransformer
. Since that PR also introducesRefasterTest
, it might be better to finalize that PR first, and apply this PR on top of the result.First proposal of a suggested commit message: