-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify intermediate code/value format to be less Haskell-runtime specific #4311
Conversation
The goal of these changes is to remove unboxed details from the interchange formats, and instead exchange values that somewhat resemble surface level unison values. In some cases, the way this works is obvious. For instance, the MatchNumeric/NMatch and BLit constructs avoid having unboxed details in code. However, what might not be so obvious is the request/data matching constructs. These prevent unboxed values from occurring on the stack in normal unison functions, which means they no longer end up in captured continuations. The unboxed details were originally intended to support optimizations that would turn surface unison functions into more efficient versions that operate without any boxing. However, that didn't materialize, and it seems unlikely that we'll implement it for the Haskell end of things (we aren't really obliged to do this ourselves for good performance in scheme). So, some of these new methods of doing things might actually be more efficient than what was happening previously.
The implementation was producing Maybe, but the calling convention just called the primop in tail position. This would leave the Just tag on the unboxed stack afterwards.
This covers values that are stored as pseudo data types in the Haskell runtime, which would otherwise need to be serialized using unboxed data in the byte format.
This is necessary for checking equality of code.
The serialization is the same, but this is a bit less strange as a representation.
Previous implementation was directly including primops into the generated intermediate code, which involved unboxed literal arguments. These fake builtins allow for generating boxed literals so that no unboxed data is on the surface stack. I believe this was the last source of unboxed data on captured stacks.
Seems like running base tests takes way too long. |
Adds versions of the tests saved in an old format, checking that they still work, and match the new version of the same test.
I added versions of the random serialized test cases saved in old formats. The test now deserializes the old version, runs it, and checks that the output is the same as the new version of the test. |
Stew pointed out that boxed arrays aren't serializable yet, so I'll add those here unless there's some objection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Question on testing: Are you able to run all the test in base successfully with these changes?
I don't think this PR should modify the existing .ser files - for now we still want assurances that we can still read the old format in the Haskell runtime.
Instead, I'd introduce new .ser files for the new format and rejigger the tests to have the Haskell tests look at both *.v3.ser
and *.v4.ser
and the scheme can just look at *.v4.ser
.
-- reqArgs :: Args -> Args | ||
-- reqArgs = \case | ||
-- ZArgs -> UArg1 0 | ||
-- UArg1 i -> UArg2 0 (i + 1) | ||
-- UArg2 i j | ||
-- | i == 0 && j == 1 -> UArgR 0 3 | ||
-- | otherwise -> UArgN (fl [0, i + 1, j + 1]) | ||
-- BArg1 i -> DArg2 0 i | ||
-- BArg2 i j | ||
-- | j == i + 1 -> DArgR 0 1 i 2 | ||
-- | otherwise -> DArgN (fl [0]) (fl [i, j]) | ||
-- DArg2 i j | ||
-- | i == 0 -> DArgR 0 2 j 1 | ||
-- | otherwise -> DArgN (fl [0, i + 1]) (fl [j]) | ||
-- UArgR i l | ||
-- | i == 0 -> UArgR 0 (l + 1) | ||
-- | otherwise -> UArgN (fl $ [0] ++ Prelude.take l [i + 1 ..]) | ||
-- BArgR i l -> DArgR 0 1 i l | ||
-- DArgR ui ul bi bl | ||
-- | ui == 0 -> DArgR 0 (ul + 1) bi bl | ||
-- | otherwise -> | ||
-- DArgN | ||
-- (fl $ [0] ++ Prelude.take ul [ui + 1 ..]) | ||
-- (fl $ Prelude.take bl [bi ..]) | ||
-- UArgN us -> UArgN (fl $ [0] ++ fmap (+ 1) (tl us)) | ||
-- BArgN bs -> DArgN (fl [0]) bs | ||
-- DArgN us bs -> DArgN (fl $ [0] ++ fmap (+ 1) (tl us)) bs | ||
-- DArgV i j -> DArgV i j | ||
-- where | ||
-- fl = primArrayFromList | ||
-- tl = primArrayToList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd delete this code if it's not being used anymore.
The base tests were run in the CI job before my comment that they take too long (like, 30 minutes it seems). :) The existing .ser/hash/out files are generated by other tests that run during CI to make sure there are no accidental changes. So they have to be changed. I added files with the v3 format in commit d6c0ae2. I could rename them, though. |
@dolio is going to:
|
Allow output files to not have a version in their name
Okay, great, merging this! |
This PR implements various changes that make the code and value formats that are sent between Unison nodes less specific to the details of the Haskell runtime. The initial impetus for this was the observation that the Scheme runtime cannot easily tell the difference between positive values of
Nat
andInt
. But I've included other changes that I think would be problematic in the future. Here's a basic description of the changes I've made:MatchNumeric
. This is a direct match on a number value which doesn't involve pretending it's a special sort of data type that doesn't exist in the Unison surface syntax (which is howMatchIntegral
works). This allows case statements onNat
andInt
to be directly compiled to an intermediate construct, which tracks the builtin type involved.List.splitLeft
andList.splitRight
for compiling list pattern matching. Previously it was directly including primitive operations in the generated code, and thus unboxed literals.Char
andNat
.Nat
andInt
aren't identified directly, but as literals for positive and negative numbers, and the runtime is a bit flexible about how it treats such numbers. Previously these were exchanged as the special sorts of data types with unboxed data.Together with some other stuff I'll mention later, this means the interchange format doesn't need to contain arbitrary unboxed data. Instead, it's more like exchanging surface level Unison values. I didn't actually change the way the Haskell runtime represents these things, so it is still capable of loading and running things that had already been serialized using the old format. But it will no longer produce such serialized representations, and the new format is a bit more agnostic about the exact way that the values will be represented in a particular runtime.
To achieve the above, though, I did change some of the execution details to avoid getting unboxed data persistently onto the stack:
MCode
now has some direct matching constructs for case statements. The old strategy was to dump data/number/request values to the stack and then do cases on values read from arbitrary positions on the unboxed stack. That's still possible, but isn't actually generated anymore. Instead, matching will generally dump the fields of a data/request type to the (boxed) stack, but directly handle branching on the tags. And for numeric matches, it will just directly look at the numeric value without using the stack. This means that tags and such don't end up in the continuations captured by abilities.Nat
andInt
arguments. This can happen if a positiveInt
gets serialized and deserialized.I think that existing compiled
.uc
files could still be loaded by the new runtime, but things like the ability request details wouldn't match any newly compiled code, so it's a better idea to recompile at this level.I've added some tests to make sure that numbers hash the same whether they're
Nat
orInt
. Other changes are somewhat checked by existing tests, as many of these changed required some debugging before all tests passed.Changes to make the scheme runtime compatible with these changes are included. I'll release a new version of the compiler on share shortly.
I've still got some additional testing to add, but I think this is close enough to get CI checking it.