Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize mutex-meet-tid privatization to arbitrary digest #1278

Merged
merged 5 commits into from
Jan 11, 2024

Conversation

sim642
Copy link
Member

@sim642 sim642 commented Nov 30, 2023

The current implementation of PerMutexMeetTIDPriv for both base and relational analyses hard-codes thread IDs as the digest framework. This generalizes them over a digest module.

@sim642 sim642 added cleanup Refactoring, clean-up feature labels Nov 30, 2023
@sim642
Copy link
Member Author

sim642 commented Dec 1, 2023

Unfortunately, this is far from being general enough. I've been trying to get may-have-ever-been-locked digests working on the lock-digest branch and it doesn't work out in any simple way.

Unlike the lockset digests and thread ID digests, the observing lock operation for this abstraction requires getting all traces from all observable unlocks. Moreover, a cartesian product should be taken: one combining operation for each local state at lock and one state from the incoming traces. That essentially demands some kind of path splitting in globals (path-sensitive flow-insensitive unknowns).
Moreover, the splitting by maylocksdigest analysis should somehow split the globals of the apron analysis, but that's not possible in our current framework. Local path-sensitivity works like this like a product, but globals are a sum over analyses.

Besides that, mutex-meet-tid, both as implemented and as specified in the paper, tightly couples thread IDs in another way for the thread returns and joins. It all works out if thread IDs are both used for that and the digests, but not in general.

@michael-schwarz
Copy link
Member

michael-schwarz commented Dec 1, 2023

I have not looked into the implementation yet, so these are if you want "observations untainted by reality" 😉

requires getting all traces from all observable unlocks.

No, it suffices here to get the digests, no need to access the actual traces.

That essentially demands some kind of path splitting in globals (path-sensitive flow-insensitive unknowns).

I don't really understand what you mean by this: This is the same kind of information as are the thread ids: Things that additionally refine the values -- whether implemented as a map under the global (implementation) or as a in the paper (by having different unknowns).
Or were you talking about the implementation attempt in this PR here?

Moreover, the splitting by maylocksdigest analysis should somehow split the globals of the apron analysis

In what way is this different from the thread id based splitting we already have?

Besides that, mutex-meet-tid, both as implemented and as specified in the paper, tightly couples thread IDs in another way for the thread returns and join

Yes, this is a problem: Actually even the restricted publishing at unlock to only things I have written myself (as well as not reading from your own global invariant for unique threads) is a departure from the generic refinement framework. (See appendix G for the actual instance of $\mathcal{A}$ one needs to construct to prove these things sound).

@sim642
Copy link
Member Author

sim642 commented Dec 1, 2023

No, it suffices here to get the digests, no need to access the actual traces.

Sorry, that's what I meant: digests for all the reaching traces.

The big point is in the following:

  1. Lockset digests don't actually use that information: image (the $S'$ is unused).
  2. Thread IDs don't either because the ID of the ego thread does not change in any way due to anything it observes (we don't even give right-hand sides for it).
  3. May-ever-have-been-locked digests crucially use it: image (the $L'$ is used).

For this reason, lockset and thread ID analyses don't even need global unknowns. This crucially makes them nicely fit into Goblint.

I don't really understand what you mean by this: This is the same kind of information as are the thread ids: Things that additionally refine the values -- whether implemented as a map under the global (implementation) or as a in the paper (by having different unknowns).
Or were you talking about the implementation attempt in this PR here?

It's not just that (map domain vs separate unknowns), but follows from the above observation. Because lockset and thread ID digests don't have globals, their (unit) globals also don't split all the other globals. But for may-ever-have-been-locked digests, this is no longer the case. In the paper view, the digests act as a product with the existing global unknowns (if you have two digests, it's effectively a product digest), not as a sum.

In Goblint, each analysis providing digests (e.g. mutex or threadid) adds global unknowns instead of multiplying. In a product, the may-ever-have-been-locked digests split apron global unknowns (this also explicitly happens with the current mutex-meet-tid implementation for thread IDs), but also vice versa (!): the global unknowns for may-ever-have-been-locked should be split. Since locksets/thread IDs don't use global unknowns, there's nothing to split and no circularity arises.


Therefore, I think the ad-hoc implementation of digests into relation analysis itself doesn't cut it. Instead we'll probably have to do how the paper presents it: the digest splitting lifts any existing analysis, i.e. it's a Spec functor in Goblint. It also means digests (provided by whatever activated analyses) need to be first-class citizens in Spec and multiplied into a product by MCP (as opposed to summing like for V).
Not only that, there's a recursive dependency between the two:

  1. V needs to be lifted into a product with the combined Digest,
  2. while the global unknowns in each Digest-providing analysis are split by all other digests themselves.

But even in that case, a very practical problem arises from the definition of observing actions:
image
Formally, we define it universally over all $A_0, A_1, A'$ but that's not how we can implement it at all. In a Goblint transfer function, we're in the middle of computing it for a particular constraint unknown (i.e. for a given $A'$). Therefore, $A_0, A_1$ are existentially quantified: we have to find all combinations of them that actually yield the $A'$ whose right-hand side we're in. Multiple combinations may give the same $A'$, so the implementation needs to join over all such.

In a way, that's analogous to the non-side-effecting constraint system definition for partial context-sensitivity. And we work around the existential nature using side effects. I suspect we'll need to somehow do a similar transformation here: have additional side-effected unknowns that can be directly looked up for the joined value, but it's not so clear to me how that should work (an unlock side-effects to all lock nodes in all functions and contexts?).
This is a non-problem for existing digests precisely because they don't depend on the incoming digest, but only at the previous local one, which we know where to find (and put into ctx.local).

@michael-schwarz
Copy link
Member

Ah, yes, these are indeed problems one faces here.

I am actually defining the term of being ego-lane-derived for these kinds of digests, that can be computed from the ego lane and do not consider the values of digests incorporated at other unknowns (beyond whether the transition is defined at all or not):

ego

but also vice versa (!): the global unknowns for may-ever-have-been-locked should be split.

I don't quite understand this point, as none of the digests should have their own unknowns -- instead they ought to refine the existing ones. But maybe we can discuss this in detail at the GobCon then.

@sim642
Copy link
Member Author

sim642 commented Dec 5, 2023

This discussion can continue in #1286. Although this PR clearly does a generalization, it might not be enough.

@michael-schwarz
Copy link
Member

@sim642 does it make sense for me to review this, i.e., is this something that is sound and we should have merged even if it does not generalize as much as we had hoped for?

@sim642
Copy link
Member Author

sim642 commented Jan 9, 2024

is this something that is sound and we should have merged even if it does not generalize as much as we had hoped for?

I think we should. It better distinguishes TIDs of digest and TIDs of thread returns in these privatizations.

@sim642 sim642 added this to the v2.4.0 milestone Jan 11, 2024
@sim642 sim642 merged commit f53e4eb into master Jan 11, 2024
16 checks passed
@sim642 sim642 deleted the mutex-meet-digest branch January 11, 2024 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Refactoring, clean-up feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants