Further optimize intermediates_to_table_indices
#1457
Labels
performance
This affects protocol performance
intermediates_to_table_indices
#1457
intermediates_to_table_indices
works as follows:bits_to_table_indices
, which takes threeu128
s each containing the value of one of three intermediates for 128 multiplications, and returns fouru128
s containing a table index in each nibble.It appears that
bits_to_table_indices
compiles to <200 instructions (fully unrolled with no loops or branches), while the rearranging of nibbles compiles to >1000 instructions (again, fully unrolled with no loops or branches). Implementing a single transpose-like operation covering both steps would probably be more efficient.The text was updated successfully, but these errors were encountered: