-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shift32 chip: output constraint(s) #124
Comments
I had thought that this issue would be easy, because I thought that to complete the argument we could just say I found two approaches to take inspiration from: Succinct SP1's approach, and TinyRAM's approach. I tried to look at how RISC Zero solves the problem, but I didn't end up finding their code that solves this problem. In summary, the approaches:
TinyRAM's approach works easily enough in a large enough field to perform the multiplication without overflow. This is the setting of TinyRAM. TinyRAM's approach can't be directly adapted to Valida. Prima facie, this suggests in favor of emulating SP1's approach. However, I'm thinking it may make more sense to take inspiration from TinyRAM's approach, even though it can't be directly adapted. The basic premise behind TinyRAM's approach is that bit shifts can be checked using multiplication or division by a power of two, provided a large enough algebraic structure to do that arithmetic in without overflowing. No such algebraic structure currently exists in Valida. I propose to add a 64-bit unsigned multiplication chip to Valida, which would be used to check 32-bit signed and unsigned division, and 32-bit signed and unsigned bit shifts. I am not proposing to add a 64-bit unsigned multiplication instruction. A 64-bit unsigned multiplication instruction would exceed the data throughput of Valida's memory channel. Valida has three 32-bit memory channels, for a total memory bus throughput of 96 bits per clock cycle. A 64-bit unsigned multiplication instruction, let's say with two 32-bit inputs and one 64-bit output, would require a memory bus throughput of at least 128 bits per clock cycle. However, the data throughput (i.e., the number of bits of data per row) of some of the cross-chip interactions is substantially greater than the throughput of the memory buses, so the data throughput required by a 64-bit unsigned multiplication chip, which is only used on the back end by other chips, seems like a viable idea without substantially changing the performance characteristics of the permutation argument which checks the cross-chip interactions. A 64-bit unsigned multiplication chip's trace could be checked in a similar fashion to the 32-bit unsigned multiplication chip. The 32-bit unsigned multiplication chip decomposes the inputs into bytes, multiplies these separately, and sums together the results. What would mainly be different about a 64-bit unsigned multiplication chip with 32-bit inputs, compared to the existing 32-bit chip, is that the 64-bit chip's output would be bigger, and there would be no truncating of the answer in the case of the 64-bit chip. Given a 64-bit unsigned multiplication chip, a 32-bit unsigned division Given 32-bit unsigned division, 32-bit signed division can be checked by factoring out and multiplying together the sign bits, which reduces the problem to 32-bit unsigned division. A 64-bit unsigned multiplication chip would also help for implementing Finally, given 64-bit unsigned multiplication and 32-bit signed and unsigned division, shifts left, logical shifts right, and arithmetic shifts right can all be implemented as special cases. This suggests to me that a 64-bit unsigned multiplication chip would help to kill several birds with one stone, as several unsolved problems in the Valida STARK can be straightforwardly reduced to it. As such, I'd like to try solving this issue by reducing it to 64-bit unsigned multiplication and implementing a 64-bit unsigned multiplication chip. I'm not sure if I want to do this by way of the 32-bit division chips, or without those layers of indirection; currently leaning towards the latter. |
See: #137 |
Currently the Shift32 chip STARK does not put any constraint on its output. The output should be constrained to be equal to
input_1 * power_of_two
, truncated to a 32-bit word.The text was updated successfully, but these errors were encountered: