-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dyno: fix query recursion bug #26459
Draft
DanilaFe
wants to merge
3
commits into
chapel-lang:main
Choose a base branch
from
DanilaFe:recursion-bug
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
+104
−8
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Danila Fedorin <[email protected]>
Signed-off-by: Danila Fedorin <[email protected]>
Signed-off-by: Danila Fedorin <[email protected]>
DanilaFe
added a commit
to DanilaFe/chapel
that referenced
this pull request
Jan 3, 2025
This incurs a penalty up front, but prevents odd ordering properties in the query system. It's a workaround for a query system bug detailed in chapel-lang#26459. Signed-off-by: Danila Fedorin <[email protected]>
1 task
Ideally, we should find a way to solve it that avoids the performance overhead here. Here are a few ideas that you might be able to investigate
|
DanilaFe
added a commit
that referenced
this pull request
Jan 6, 2025
This incurs a penalty up front, but prevents odd ordering properties in the query system. It's a workaround for a query system bug detailed in #26459. This way, cycles in the standard module dependency graph are always visited in the same order (starting with `ChapelBase`). As a result, we don't create cycles over multiple generations. This workaround does not prevent the issue from occurring in other contexts, such as user projects with circular dependencies. Reviewed by @benharsh -- thanks! ## Testing - [x] Anna's test branch doesn't crash
riftEmber
pushed a commit
to riftEmber/chapel
that referenced
this pull request
Jan 6, 2025
This incurs a penalty up front, but prevents odd ordering properties in the query system. It's a workaround for a query system bug detailed in chapel-lang#26459. Signed-off-by: Danila Fedorin <[email protected]>
riftEmber
pushed a commit
to riftEmber/chapel
that referenced
this pull request
Jan 6, 2025
This incurs a penalty up front, but prevents odd ordering properties in the query system. It's a workaround for a query system bug detailed in chapel-lang#26459. Signed-off-by: Danila Fedorin <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is not merge ready since it has a 300% performance hit on Dyno as things stand (debug and release). However, I can't think of a different ways to improve performance, and the "slow" parts appear fundamental to the issue.
This fixes a bug pointed out by @riftEmber in which we get a query error similar to the one fixed in #26061. Indeed, her intuition was right: this is another bug that has to do with
isQueryRunning
, but it's much more insidious, and much harder to fix performantly.The Bug
The issue has to do, once again, with cycles of query dependencies guarded by
isQueryRunning
. Consider a cycle of three queries:A -> B -> C -> A
. Each query runs anisQueryRunning
before invoking its dependency to avoid cycles. As a result, we get:Importantly, in this case, the query
A
does not get added as a dependency to toC
, since theisQueryRunning
check doesn't add dependencies by itself, and a call toA
is prevented. However, we do have edges A -> B and B -> C.In a new generation, suppose we start the with
B
instead of callingA
at the top level as before. Suppose additionally that another dependency ofC
has changed, necessitating its recomputation. Then, we get:At this point, A has been added as an edge to C, completing a dependency cycle in the graph. We will now get infinite recursion.
The Fix
What happened? At the surface, an issue is that we add dependency edges, but never remove them. So, as different query orderings occur, we get closer and closer to getting cycles.
A deeper issue is that we have hysteresis w.r.t.
isQueryRunning
. If it returnstrue
during one query execution, we implicitly assume (when checking for recomputation) that it will always returntrue
. Thus, it's possible to write queries that break the query system, since they don't react to the changing structure of the graph.My solution was to track calls to
isQueryRunning
as edges. These are a new kind of edge that are not recursively traversed; rather, they use the same mechanism asisQueryRunning
to check if the previous result has changed, and react accordingly. This resolves the problem, but adding these new edges and makingisQueryRunning
checks slows down dyno very significantly (testDomains
goes from 6 seconds to 23 seconds). The delays are both due to the increased size of the dependencies vector (6 -> 8 seconds) and the checking itself (8 seconds -> 23 seconds).I'm not sure how to proceed, since this is a clear correctness issue. Perhaps @mppf will have some advice?
Performance notes
My initial intuition was that the slowdown was caused due to invalidation of queries (i.e., we used to use a cached result, but it's not sound, so we no longer do). To test this hypothesis, I executed the following script which invokes
testInteractive
on twotestDomains
programs.`domains.1.chpl`
`domains.2.chpl`
I made a change to
testInteractive
to persist query timing information between generations.Diff to `testInteractive.cpp`
I then executed the following script to gather timing data across generations:
I found that queries were run more often after the patch was applied, but not by a lot (28522 queries vs 25272 queries). However, the total time elapsed by each query was significantly larger (31476 vs 19109.8). This makes me think that my inital hypothesis was incorrect; rather, query execution simply got much slower.
In particular, my initial results suggest that the following queries got much slower:
emitMultipleDefinedSymbolErrorsQuery
andfilterCandidatesInitialGatherRejectedQuery
(a factor of at least 5x slower).