Modify intermediate code/value format to be less Haskell-runtime specific #4311

dolio · 2023-09-07T21:28:03Z

This PR implements various changes that make the code and value formats that are sent between Unison nodes less specific to the details of the Haskell runtime. The initial impetus for this was the observation that the Scheme runtime cannot easily tell the difference between positive values of Nat and Int. But I've included other changes that I think would be problematic in the future. Here's a basic description of the changes I've made:

I introduced a new type of matching expression in the intermediate code, MatchNumeric. This is a direct match on a number value which doesn't involve pretending it's a special sort of data type that doesn't exist in the Unison surface syntax (which is how MatchIntegral works). This allows case statements on Nat and Int to be directly compiled to an intermediate construct, which tracks the builtin type involved.
I introduced a new sort of literal expression which directly produces the boxed literal. This is sort of the opposite operation to the one above, and lets you include a literal number without pretending it's a special sort of data type wrapping an unboxed value.
I made two wrapped pseudo built-ins List.splitLeft and List.splitRight for compiling list pattern matching. Previously it was directly including primitive operations in the generated code, and thus unboxed literals.
I added several new literals to the value interchange format that correspond to the builtin types, like Char and Nat. Nat and Int aren't identified directly, but as literals for positive and negative numbers, and the runtime is a bit flexible about how it treats such numbers. Previously these were exchanged as the special sorts of data types with unboxed data.

Together with some other stuff I'll mention later, this means the interchange format doesn't need to contain arbitrary unboxed data. Instead, it's more like exchanging surface level Unison values. I didn't actually change the way the Haskell runtime represents these things, so it is still capable of loading and running things that had already been serialized using the old format. But it will no longer produce such serialized representations, and the new format is a bit more agnostic about the exact way that the values will be represented in a particular runtime.

To achieve the above, though, I did change some of the execution details to avoid getting unboxed data persistently onto the stack:

MCode now has some direct matching constructs for case statements. The old strategy was to dump data/number/request values to the stack and then do cases on values read from arbitrary positions on the unboxed stack. That's still possible, but isn't actually generated anymore. Instead, matching will generally dump the fields of a data/request type to the (boxed) stack, but directly handle branching on the tags. And for numeric matches, it will just directly look at the numeric value without using the stack. This means that tags and such don't end up in the continuations captured by abilities.
I also changed the exact packing representation used for ability requests. Previously it was packing the ability/effect information into two spots in the value, because that was the easiest way to consume it with the 'dump to the stack' implementation. But since now there is a dedicated request matching operation, all the information can be packed into the tag with less redundancy than before.
Direct boxed versions of the literal instructions have also been included, to match the boxed literals in the intermediate code.
Universal equality and comparison have been tweaked to be accept being called with mixes of Nat and Int arguments. This can happen if a positive Int gets serialized and deserialized.

I think that existing compiled .uc files could still be loaded by the new runtime, but things like the ability request details wouldn't match any newly compiled code, so it's a better idea to recompile at this level.

I've added some tests to make sure that numbers hash the same whether they're Nat or Int. Other changes are somewhat checked by existing tests, as many of these changed required some debugging before all tests passed.

Changes to make the scheme runtime compatible with these changes are included. I'll release a new version of the compiler on share shortly.

I've still got some additional testing to add, but I think this is close enough to get CI checking it.

The goal of these changes is to remove unboxed details from the interchange formats, and instead exchange values that somewhat resemble surface level unison values. In some cases, the way this works is obvious. For instance, the MatchNumeric/NMatch and BLit constructs avoid having unboxed details in code. However, what might not be so obvious is the request/data matching constructs. These prevent unboxed values from occurring on the stack in normal unison functions, which means they no longer end up in captured continuations. The unboxed details were originally intended to support optimizations that would turn surface unison functions into more efficient versions that operate without any boxing. However, that didn't materialize, and it seems unlikely that we'll implement it for the Haskell end of things (we aren't really obliged to do this ourselves for good performance in scheme). So, some of these new methods of doing things might actually be more efficient than what was happening previously.

The implementation was producing Maybe, but the calling convention just called the primop in tail position. This would leave the Just tag on the unboxed stack afterwards.

This covers values that are stored as pseudo data types in the Haskell runtime, which would otherwise need to be serialized using unboxed data in the byte format.

This is necessary for checking equality of code.

The serialization is the same, but this is a bit less strange as a representation.

Previous implementation was directly including primops into the generated intermediate code, which involved unboxed literal arguments. These fake builtins allow for generating boxed literals so that no unboxed data is on the surface stack. I believe this was the last source of unboxed data on captured stacks.

dolio · 2023-09-07T23:31:20Z

Seems like running base tests takes way too long.

Adds versions of the tests saved in an old format, checking that they still work, and match the new version of the same test.

dolio · 2023-09-08T16:56:09Z

I added versions of the random serialized test cases saved in old formats. The test now deserializes the old version, runs it, and checks that the output is the same as the new version of the test.

dolio · 2023-09-08T20:15:43Z

Stew pointed out that boxed arrays aren't serializable yet, so I'll add those here unless there's some objection.

…rmat

pchiusano

Looks good.

Question on testing: Are you able to run all the test in base successfully with these changes?

I don't think this PR should modify the existing .ser files - for now we still want assurances that we can still read the old format in the Haskell runtime.

Instead, I'd introduce new .ser files for the new format and rejigger the tests to have the Haskell tests look at both *.v3.ser and *.v4.ser and the scheme can just look at *.v4.ser.

pchiusano · 2023-09-11T16:48:01Z

parser-typechecker/src/Unison/Runtime/MCode.hs

+-- reqArgs :: Args -> Args
+-- reqArgs = \case
+--   ZArgs -> UArg1 0
+--   UArg1 i -> UArg2 0 (i + 1)
+--   UArg2 i j
+--     | i == 0 && j == 1 -> UArgR 0 3
+--     | otherwise -> UArgN (fl [0, i + 1, j + 1])
+--   BArg1 i -> DArg2 0 i
+--   BArg2 i j
+--     | j == i + 1 -> DArgR 0 1 i 2
+--     | otherwise -> DArgN (fl [0]) (fl [i, j])
+--   DArg2 i j
+--     | i == 0 -> DArgR 0 2 j 1
+--     | otherwise -> DArgN (fl [0, i + 1]) (fl [j])
+--   UArgR i l
+--     | i == 0 -> UArgR 0 (l + 1)
+--     | otherwise -> UArgN (fl $ [0] ++ Prelude.take l [i + 1 ..])
+--   BArgR i l -> DArgR 0 1 i l
+--   DArgR ui ul bi bl
+--     | ui == 0 -> DArgR 0 (ul + 1) bi bl
+--     | otherwise ->
+--         DArgN
+--           (fl $ [0] ++ Prelude.take ul [ui + 1 ..])
+--           (fl $ Prelude.take bl [bi ..])
+--   UArgN us -> UArgN (fl $ [0] ++ fmap (+ 1) (tl us))
+--   BArgN bs -> DArgN (fl [0]) bs
+--   DArgN us bs -> DArgN (fl $ [0] ++ fmap (+ 1) (tl us)) bs
+--   DArgV i j -> DArgV i j
+--   where
+--     fl = primArrayFromList
+--     tl = primArrayToList


I'd delete this code if it's not being used anymore.

dolio · 2023-09-11T17:51:49Z

The base tests were run in the CI job before my comment that they take too long (like, 30 minutes it seems). :)

The existing .ser/hash/out files are generated by other tests that run during CI to make sure there are no accidental changes. So they have to be changed. I added files with the v3 format in commit d6c0ae2. I could rename them, though.

pchiusano · 2023-09-11T19:33:32Z

@dolio is going to:

Rename the .ser files to include a version number, just for clarity

Allow output files to not have a version in their name

pchiusano · 2023-09-13T20:41:30Z

Okay, great, merging this!

dolio added 14 commits September 1, 2023 12:59

stdHandle didn't match its calling convention

bd51afe

The implementation was producing Maybe, but the calling convention just called the primop in tail position. This would leave the Just tag on the unboxed stack afterwards.

Add more literals to value serialization format.

4245897

This covers values that are stored as pseudo data types in the Haskell runtime, which would otherwise need to be serialized using unboxed data in the byte format.

Add a test for int/nat hash equivalence

f450e97

Alignment for new code constructors

9c9654d

This is necessary for checking equality of code.

Store Char in interchange literals

9da57e9

The serialization is the same, but this is a bit less strange as a representation.

Fix up tests w/r/t new ANF constructors

0f52e01

Implement racket portion of new intermediate format

ccb25dc

Expected transcript changes: hashes

f9a8b6c

Increment code serialization version

dfc3814

Ormolu changes

d9e0806

Increment @unison/internal version

3139432

Try running base tests from CI

0ec34c4

dolio added 2 commits September 8, 2023 10:58

Remove base tests from interpreter builtin tests

f9e8c78

Add a note in comments on new serialization format

1384d8a

dolio requested a review from pchiusano September 8, 2023 16:06

Add to random-deserial tests

d6c0ae2

Adds versions of the tests saved in an old format, checking that they still work, and match the new version of the same test.

dolio added 3 commits September 8, 2023 20:01

Add serialization for boxed arrays

699b9d6

Increment share compiler version

3c042b8

Merge remote-tracking branch 'origin/trunk' into topic/interchange-fo…

16c5b7b

…rmat

pchiusano approved these changes Sep 11, 2023

View reviewed changes

Eliminate dead reqArgs code

95c9eb3

dolio added 2 commits September 11, 2023 16:20

More informative names for serialized test cases

a9065ec

Change transcripts for new serialized file names

9d52145

dolio added 2 commits September 11, 2023 16:53

Tweak serial case generator code

2f47a85

Allow output files to not have a version in their name

Fix JIT/interpreter builtin serial test names

9fb3496

pchiusano merged commit 67f510e into trunk Sep 13, 2023
6 checks passed

pchiusano deleted the topic/interchange-format branch September 13, 2023 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify intermediate code/value format to be less Haskell-runtime specific #4311

Modify intermediate code/value format to be less Haskell-runtime specific #4311

dolio commented Sep 7, 2023

dolio commented Sep 7, 2023

dolio commented Sep 8, 2023

dolio commented Sep 8, 2023

pchiusano left a comment

pchiusano Sep 11, 2023

dolio commented Sep 11, 2023

pchiusano commented Sep 11, 2023 •

edited by dolio

Loading

pchiusano commented Sep 13, 2023

Modify intermediate code/value format to be less Haskell-runtime specific #4311

Modify intermediate code/value format to be less Haskell-runtime specific #4311

Conversation

dolio commented Sep 7, 2023

dolio commented Sep 7, 2023

dolio commented Sep 8, 2023

dolio commented Sep 8, 2023

pchiusano left a comment

Choose a reason for hiding this comment

pchiusano Sep 11, 2023

Choose a reason for hiding this comment

dolio commented Sep 11, 2023

pchiusano commented Sep 11, 2023 • edited by dolio Loading

pchiusano commented Sep 13, 2023

pchiusano commented Sep 11, 2023 •

edited by dolio

Loading