-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #21619: Refactor NotNullInfo to record every reference which is retracted once. #21624
Conversation
I think this is the right idea but reasoning about three sets is hard and it's hard to convince oneself that the implementation is correct for all cases. I wonder if a simpler implementation would be possible with only two sets. Let's still call them asserted and retracted. We lift the invariant that asserted & retracted is empty, so it's possible for one variable to be in both. asserted is the set that we know is non-null at the very end of evaluating the expression (if no exception happens). retracted is the set that is retracted anywhere within the expression, even if it is later asserted. When we assert a variable, we add it to asserted but do not remove it from retracted. When we retract a variable, we add it to retracted and remove it from asserted. If we evaluate the expression to the very end, if a variable is in both asserted and retracted, we consider it asserted. When we're at a control flow point where the expression may not have executed to the end (because it threw an exception), we just clear the asserted set and keep only the retracted set. At control flow merges, we intersect asserted and union retracted. For a sequential composition (a1,r1) ; (a2,r2), we get ((a1-r2) | a2, r1|r2). Do you think this would work or do you see any flaw in it? I think maybe what I've described above are the current sets asserted and onceRetracted in this pull request. I guess the question is, if we have asserted and onceRetracted, can we get rid of retracted? (Then eventually we could rename onceRetracted to just retracted, to make the name shorter and simpler.) |
In the above proposal:
|
This rule looks reasonable to me. I will think about more examples to double-check this. |
TODO: some other improvements I want for flow typing but haven't been implemented yet:
|
|
||
/** The alternative path combination with another not-null info. Used to merge | ||
* the nullability info of the two branches of an if. | ||
*/ | ||
def alt(that: NotNullInfo): NotNullInfo = | ||
NotNullInfo(this.asserted.intersect(that.asserted), this.retracted.union(that.retracted)) | ||
|
||
def withRetracted(that: NotNullInfo): NotNullInfo = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it would be useful to have a comment explaining when this function is intended to be used. The name explains what the function does, but the idea is to say why one would want to do that. (Compare with the alt
function, whose name says when you want it.) It looks like the use case is similar to alt
, but in a case when one branch is known to not complete execution normally (e.g. an infinite loop), only exceptionally. We can say that in the comment and/or consider incorporating that into the name somehow.
This also raises the question of what to do when both alternatives cannot complete normally. Theoretically one would want to assert
all variables, but we don't have a way to do that in the implementation. Perhaps that case is not common enough in practice to care about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we want to assert
all variables if none of the alternatives complete normally? Do you have an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general the analysis proves the following condition: in every execution trace leading to a given program point x
is not null. When the code before x
at x
is not null.
The specific case of withRetracted
handles something like alt
, but where the that
branch does not complete normally. It is defined as <a1,r1>.withRetracted(<a2,r2>) = <a1, r1 union r2>
. alt
is defined as <a1,r1>.alt(<a2,r2>) = <a1 intersect a2, r1 union r2>
. If we could represent the universal set in a2
, then a1 intersect a2
would be a1
, so we could implement the equivalent of withRetracted
using alt
.
I don't know whether actual use cases in practice justify the code complexity of adding a representation of the universal set to the implementation, but it is cleaner in theory.
If we don't do that, then the comment/name of withRetracted
should suggest that it is like alt
, but for the case where that
cannot complete normally.
nnInfo = nothingCases.foldLeft(nnInfo): | ||
(nni, c) => nni.withRetracted(c.notNullInfo) | ||
if normalCases.nonEmpty then | ||
nnInfo = nnInfo.seq(normalCases.map(_.notNullInfo).reduce(_.alt(_))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems an abuse of seq
, since there is no sequencing with the nothingCases
; conceptually it is alt
.
A conceptually cleaner approach would be to handle the normalCases
first, then apply the withRetracted
for the nothingCases
to the result of the normalCases
.
I tried to implemented a Ideally with the If we ignore these Nothing` expressions, we would only lose some specific tracking, and I think it is an acceptable tradeoff. val s: String | Null
if ... then
s = null
return
else
s = ""
s // treating s as non-nullable or not? |
OK, I don't think an |
Co-authored-by: Ondřej Lhoták <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM.
val retractedVars = ctx.notNullInfos.flatMap(info => | ||
if info.asserted == null then Set.empty | ||
else info.asserted.filter(isRetracted) | ||
).toSet | ||
ctx.addNotNullInfo(NotNullInfo(Set(), retractedVars)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we lose knowledge about the loop being unreachable. The result always has asserted == Set()
, even if it was null
before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine for the NotNullInfo in the ctx
. Suppose there is a terminated info in the ctx
, adding a new non-terminated info will not change its behaviour: still treating all symbols as non-nullable.
} | ||
|
||
private def notNullInfoFromCases(initInfo: NotNullInfo, cases: List[CaseDef])(using Context): NotNullInfo = | ||
if cases.nonEmpty then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what instances would the cases be empty? Depending on the answer, perhaps a comment would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the cases can be empty. Just want to avoid the exception when calling reduce
on empty list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it seems the empty cases can happen after inlining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. A question is, if empty cases does happen, what does it mean in terms of runtime semantics? If it definitely fails to match, it would be more precise to return terminatedInfo
, but initInfo
is sound and safer in case empty cases means something different in some special situations.
I think it's fine to leave it as it is, and you can merge this PR now.
This PR improves the flow typing for returning and exceptions.
The
NotNullInfo
is defined as following now:retracted
contains variable references that are ever assigned to null;asserted
is notnull
, it containsval
orvar
references that are known to be not null, after the tree finishes executing normally (non-exceptionally);asserted
isnull
, the tree is know to terminate, by throwing, returning, or calling a function withNothing
type. Hence, it acts like a universal set.alt
is defined as<a1,r1>.alt(<a2,r2>) = <a1 intersect a2, r1 union r2>
.The difficult part is the
try ... catch ... finally ...
. We don't know at which point an exception is thrown in the body, and the catch cases may be not exhaustive, we have to collect any reference that is once retracted.Fix #21619