Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify intermediate code/value format to be less Haskell-runtime specific #4311

Merged
merged 25 commits into from
Sep 13, 2023

Conversation

dolio
Copy link
Contributor

@dolio dolio commented Sep 7, 2023

This PR implements various changes that make the code and value formats that are sent between Unison nodes less specific to the details of the Haskell runtime. The initial impetus for this was the observation that the Scheme runtime cannot easily tell the difference between positive values of Nat and Int. But I've included other changes that I think would be problematic in the future. Here's a basic description of the changes I've made:

  • I introduced a new type of matching expression in the intermediate code, MatchNumeric. This is a direct match on a number value which doesn't involve pretending it's a special sort of data type that doesn't exist in the Unison surface syntax (which is how MatchIntegral works). This allows case statements on Nat and Int to be directly compiled to an intermediate construct, which tracks the builtin type involved.
  • I introduced a new sort of literal expression which directly produces the boxed literal. This is sort of the opposite operation to the one above, and lets you include a literal number without pretending it's a special sort of data type wrapping an unboxed value.
  • I made two wrapped pseudo built-ins List.splitLeft and List.splitRight for compiling list pattern matching. Previously it was directly including primitive operations in the generated code, and thus unboxed literals.
  • I added several new literals to the value interchange format that correspond to the builtin types, like Char and Nat. Nat and Int aren't identified directly, but as literals for positive and negative numbers, and the runtime is a bit flexible about how it treats such numbers. Previously these were exchanged as the special sorts of data types with unboxed data.

Together with some other stuff I'll mention later, this means the interchange format doesn't need to contain arbitrary unboxed data. Instead, it's more like exchanging surface level Unison values. I didn't actually change the way the Haskell runtime represents these things, so it is still capable of loading and running things that had already been serialized using the old format. But it will no longer produce such serialized representations, and the new format is a bit more agnostic about the exact way that the values will be represented in a particular runtime.

To achieve the above, though, I did change some of the execution details to avoid getting unboxed data persistently onto the stack:

  • MCode now has some direct matching constructs for case statements. The old strategy was to dump data/number/request values to the stack and then do cases on values read from arbitrary positions on the unboxed stack. That's still possible, but isn't actually generated anymore. Instead, matching will generally dump the fields of a data/request type to the (boxed) stack, but directly handle branching on the tags. And for numeric matches, it will just directly look at the numeric value without using the stack. This means that tags and such don't end up in the continuations captured by abilities.
  • I also changed the exact packing representation used for ability requests. Previously it was packing the ability/effect information into two spots in the value, because that was the easiest way to consume it with the 'dump to the stack' implementation. But since now there is a dedicated request matching operation, all the information can be packed into the tag with less redundancy than before.
  • Direct boxed versions of the literal instructions have also been included, to match the boxed literals in the intermediate code.
  • Universal equality and comparison have been tweaked to be accept being called with mixes of Nat and Int arguments. This can happen if a positive Int gets serialized and deserialized.

I think that existing compiled .uc files could still be loaded by the new runtime, but things like the ability request details wouldn't match any newly compiled code, so it's a better idea to recompile at this level.

I've added some tests to make sure that numbers hash the same whether they're Nat or Int. Other changes are somewhat checked by existing tests, as many of these changed required some debugging before all tests passed.

Changes to make the scheme runtime compatible with these changes are included. I'll release a new version of the compiler on share shortly.

I've still got some additional testing to add, but I think this is close enough to get CI checking it.

dolio added 14 commits September 1, 2023 12:59
The goal of these changes is to remove unboxed details from the
interchange formats, and instead exchange values that somewhat resemble
surface level unison values. In some cases, the way this works is
obvious. For instance, the MatchNumeric/NMatch and BLit constructs avoid
having unboxed details in code. However, what might not be so obvious is
the request/data matching constructs. These prevent unboxed values from
occurring on the stack in normal unison functions, which means they no
longer end up in captured continuations.

The unboxed details were originally intended to support optimizations
that would turn surface unison functions into more efficient versions
that operate without any boxing. However, that didn't materialize, and
it seems unlikely that we'll implement it for the Haskell end of things
(we aren't really obliged to do this ourselves for good performance in
scheme). So, some of these new methods of doing things might actually be
more efficient than what was happening previously.
The implementation was producing Maybe, but the calling convention just
called the primop in tail position. This would leave the Just tag on the
unboxed stack afterwards.
This covers values that are stored as pseudo data types in the Haskell
runtime, which would otherwise need to be serialized using unboxed data
in the byte format.
This is necessary for checking equality of code.
The serialization is the same, but this is a bit less strange as a
representation.
Previous implementation was directly including primops into the
generated intermediate code, which involved unboxed literal arguments.
These fake builtins allow for generating boxed literals so that no
unboxed data is on the surface stack.

I believe this was the last source of unboxed data on captured stacks.
@dolio
Copy link
Contributor Author

dolio commented Sep 7, 2023

Seems like running base tests takes way too long.

@dolio dolio requested a review from pchiusano September 8, 2023 16:06
Adds versions of the tests saved in an old format, checking that they
still work, and match the new version of the same test.
@dolio
Copy link
Contributor Author

dolio commented Sep 8, 2023

I added versions of the random serialized test cases saved in old formats. The test now deserializes the old version, runs it, and checks that the output is the same as the new version of the test.

@dolio
Copy link
Contributor Author

dolio commented Sep 8, 2023

Stew pointed out that boxed arrays aren't serializable yet, so I'll add those here unless there's some objection.

Copy link
Member

@pchiusano pchiusano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Question on testing: Are you able to run all the test in base successfully with these changes?

I don't think this PR should modify the existing .ser files - for now we still want assurances that we can still read the old format in the Haskell runtime.

Instead, I'd introduce new .ser files for the new format and rejigger the tests to have the Haskell tests look at both *.v3.ser and *.v4.ser and the scheme can just look at *.v4.ser.

Comment on lines 989 to 1019
-- reqArgs :: Args -> Args
-- reqArgs = \case
-- ZArgs -> UArg1 0
-- UArg1 i -> UArg2 0 (i + 1)
-- UArg2 i j
-- | i == 0 && j == 1 -> UArgR 0 3
-- | otherwise -> UArgN (fl [0, i + 1, j + 1])
-- BArg1 i -> DArg2 0 i
-- BArg2 i j
-- | j == i + 1 -> DArgR 0 1 i 2
-- | otherwise -> DArgN (fl [0]) (fl [i, j])
-- DArg2 i j
-- | i == 0 -> DArgR 0 2 j 1
-- | otherwise -> DArgN (fl [0, i + 1]) (fl [j])
-- UArgR i l
-- | i == 0 -> UArgR 0 (l + 1)
-- | otherwise -> UArgN (fl $ [0] ++ Prelude.take l [i + 1 ..])
-- BArgR i l -> DArgR 0 1 i l
-- DArgR ui ul bi bl
-- | ui == 0 -> DArgR 0 (ul + 1) bi bl
-- | otherwise ->
-- DArgN
-- (fl $ [0] ++ Prelude.take ul [ui + 1 ..])
-- (fl $ Prelude.take bl [bi ..])
-- UArgN us -> UArgN (fl $ [0] ++ fmap (+ 1) (tl us))
-- BArgN bs -> DArgN (fl [0]) bs
-- DArgN us bs -> DArgN (fl $ [0] ++ fmap (+ 1) (tl us)) bs
-- DArgV i j -> DArgV i j
-- where
-- fl = primArrayFromList
-- tl = primArrayToList
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd delete this code if it's not being used anymore.

@dolio
Copy link
Contributor Author

dolio commented Sep 11, 2023

The base tests were run in the CI job before my comment that they take too long (like, 30 minutes it seems). :)

The existing .ser/hash/out files are generated by other tests that run during CI to make sure there are no accidental changes. So they have to be changed. I added files with the v3 format in commit d6c0ae2. I could rename them, though.

@pchiusano
Copy link
Member

pchiusano commented Sep 11, 2023

@dolio is going to:

  • Rename the .ser files to include a version number, just for clarity

@pchiusano
Copy link
Member

Okay, great, merging this!

@pchiusano pchiusano merged commit 67f510e into trunk Sep 13, 2023
6 checks passed
@pchiusano pchiusano deleted the topic/interchange-format branch September 13, 2023 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants