Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #21619: Refactor NotNullInfo to record every reference which is retracted once. #21624

Merged
merged 8 commits into from
Dec 10, 2024

Conversation

noti0na1
Copy link
Member

@noti0na1 noti0na1 commented Sep 21, 2024

This PR improves the flow typing for returning and exceptions.

The NotNullInfo is defined as following now:

case class NotNullInfo(asserted: Set[TermRef] | Null, retracted: Set[TermRef]):
  • retracted contains variable references that are ever assigned to null;
  • if asserted is not null, it contains val or var references that are known to be not null, after the tree finishes executing normally (non-exceptionally);
  • if asserted is null, the tree is know to terminate, by throwing, returning, or calling a function with Nothing type. Hence, it acts like a universal set.

alt is defined as <a1,r1>.alt(<a2,r2>) = <a1 intersect a2, r1 union r2>.

The difficult part is the try ... catch ... finally .... We don't know at which point an exception is thrown in the body, and the catch cases may be not exhaustive, we have to collect any reference that is once retracted.

Fix #21619

@noti0na1 noti0na1 marked this pull request as ready for review September 21, 2024 23:48
@noti0na1 noti0na1 requested a review from olhotak September 21, 2024 23:48
@olhotak
Copy link
Contributor

olhotak commented Sep 22, 2024

I think this is the right idea but reasoning about three sets is hard and it's hard to convince oneself that the implementation is correct for all cases.

I wonder if a simpler implementation would be possible with only two sets. Let's still call them asserted and retracted. We lift the invariant that asserted & retracted is empty, so it's possible for one variable to be in both. asserted is the set that we know is non-null at the very end of evaluating the expression (if no exception happens). retracted is the set that is retracted anywhere within the expression, even if it is later asserted. When we assert a variable, we add it to asserted but do not remove it from retracted. When we retract a variable, we add it to retracted and remove it from asserted. If we evaluate the expression to the very end, if a variable is in both asserted and retracted, we consider it asserted. When we're at a control flow point where the expression may not have executed to the end (because it threw an exception), we just clear the asserted set and keep only the retracted set. At control flow merges, we intersect asserted and union retracted. For a sequential composition (a1,r1) ; (a2,r2), we get ((a1-r2) | a2, r1|r2).

Do you think this would work or do you see any flaw in it?

I think maybe what I've described above are the current sets asserted and onceRetracted in this pull request. I guess the question is, if we have asserted and onceRetracted, can we get rid of retracted? (Then eventually we could rename onceRetracted to just retracted, to make the name shorter and simpler.)

@olhotak olhotak assigned noti0na1 and unassigned olhotak Sep 22, 2024
@olhotak
Copy link
Contributor

olhotak commented Sep 22, 2024

In the above proposal:

  • retracted means that the variable is set to null somewhere, anywhere in the expression
  • asserted means that the variable is set to non-null after all program points that could set it to null (if any)

@noti0na1
Copy link
Member Author

  • retracted means that the variable is set to null somewhere, anywhere in the expression
  • asserted means that the variable is set to non-null after all program points that could set it to null (if any)

This rule looks reasonable to me. I will think about more examples to double-check this.

@noti0na1
Copy link
Member Author

noti0na1 commented Sep 22, 2024

TODO: some other improvements I want for flow typing but haven't been implemented yet:

  • For definition like var a: String | Null = "", we should add a to asserted info immediately .
  • We only considered Nothing body for if right now. We should consider this for other control expressions as well.


/** The alternative path combination with another not-null info. Used to merge
* the nullability info of the two branches of an if.
*/
def alt(that: NotNullInfo): NotNullInfo =
NotNullInfo(this.asserted.intersect(that.asserted), this.retracted.union(that.retracted))

def withRetracted(that: NotNullInfo): NotNullInfo =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it would be useful to have a comment explaining when this function is intended to be used. The name explains what the function does, but the idea is to say why one would want to do that. (Compare with the alt function, whose name says when you want it.) It looks like the use case is similar to alt, but in a case when one branch is known to not complete execution normally (e.g. an infinite loop), only exceptionally. We can say that in the comment and/or consider incorporating that into the name somehow.

This also raises the question of what to do when both alternatives cannot complete normally. Theoretically one would want to assert all variables, but we don't have a way to do that in the implementation. Perhaps that case is not common enough in practice to care about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we want to assert all variables if none of the alternatives complete normally? Do you have an example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the analysis proves the following condition: in every execution trace leading to a given program point $p$, variable x is not null. When the code before $p$ does not terminate, there are no traces leading to $p$ so the condition is true for every variable. If there were an instruction derefencing x at $p$, that instruction would never execute, so it would be safe to assume that x is not null.

The specific case of withRetracted handles something like alt, but where the that branch does not complete normally. It is defined as <a1,r1>.withRetracted(<a2,r2>) = <a1, r1 union r2>. alt is defined as <a1,r1>.alt(<a2,r2>) = <a1 intersect a2, r1 union r2>. If we could represent the universal set in a2, then a1 intersect a2 would be a1, so we could implement the equivalent of withRetracted using alt.

I don't know whether actual use cases in practice justify the code complexity of adding a representation of the universal set to the implementation, but it is cleaner in theory.

If we don't do that, then the comment/name of withRetracted should suggest that it is like alt, but for the case where that cannot complete normally.

compiler/src/dotty/tools/dotc/typer/Nullables.scala Outdated Show resolved Hide resolved
compiler/src/dotty/tools/dotc/typer/Nullables.scala Outdated Show resolved Hide resolved
nnInfo = nothingCases.foldLeft(nnInfo):
(nni, c) => nni.withRetracted(c.notNullInfo)
if normalCases.nonEmpty then
nnInfo = nnInfo.seq(normalCases.map(_.notNullInfo).reduce(_.alt(_)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems an abuse of seq, since there is no sequencing with the nothingCases; conceptually it is alt.

A conceptually cleaner approach would be to handle the normalCases first, then apply the withRetracted for the nothingCases to the result of the normalCases.

compiler/src/dotty/tools/dotc/typer/Typer.scala Outdated Show resolved Hide resolved
tests/explicit-nulls/neg/i21380c.scala Outdated Show resolved Hide resolved
@olhotak olhotak assigned noti0na1 and unassigned olhotak Oct 11, 2024
@noti0na1
Copy link
Member Author

noti0na1 commented Nov 4, 2024

I tried to implemented a TerminatedInfo, whose asserted set acts like a universal set, but I don't like the result.

Ideally with the TerminatedInfo, we can analyse more cases where statements are followed by a terminating expression (type Nothing), like return, throw, and other applications with result type Nothing. However, the implementation becomes much more complicated, without significant improvement. We still cannot analyse any deeply-nested exceptions. In addition, I don't like the behaviour of treating all refs non-nullable after a termination point, even though I understand it is theoretical ok to do so.

If we ignore these Nothing` expressions, we would only lose some specific tracking, and I think it is an acceptable tradeoff.

val s: String | Null
if ... then 
  s = null
  return
else
  s = ""

s // treating s as non-nullable or not?

@olhotak
Copy link
Contributor

olhotak commented Nov 4, 2024

OK, I don't think an Info with an asserted set that acts like a universal set is necessary.

Copy link
Contributor

@olhotak olhotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

val retractedVars = ctx.notNullInfos.flatMap(info =>
if info.asserted == null then Set.empty
else info.asserted.filter(isRetracted)
).toSet
ctx.addNotNullInfo(NotNullInfo(Set(), retractedVars))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we lose knowledge about the loop being unreachable. The result always has asserted == Set(), even if it was null before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for the NotNullInfo in the ctx. Suppose there is a terminated info in the ctx, adding a new non-terminated info will not change its behaviour: still treating all symbols as non-nullable.

}

private def notNullInfoFromCases(initInfo: NotNullInfo, cases: List[CaseDef])(using Context): NotNullInfo =
if cases.nonEmpty then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what instances would the cases be empty? Depending on the answer, perhaps a comment would be useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the cases can be empty. Just want to avoid the exception when calling reduce on empty list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it seems the empty cases can happen after inlining.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. A question is, if empty cases does happen, what does it mean in terms of runtime semantics? If it definitely fails to match, it would be more precise to return terminatedInfo, but initInfo is sound and safer in case empty cases means something different in some special situations.

I think it's fine to leave it as it is, and you can merge this PR now.

@olhotak olhotak assigned noti0na1 and unassigned olhotak Dec 9, 2024
@noti0na1 noti0na1 merged commit ee0dd7a into scala:main Dec 10, 2024
29 checks passed
@noti0na1 noti0na1 deleted the fix-21619 branch December 10, 2024 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

-Yexplicit-nulls exhibits false negative when combining try/match
2 participants