Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further optimize intermediates_to_table_indices #1457

Open
andyleiserson opened this issue Nov 26, 2024 · 0 comments
Open

Further optimize intermediates_to_table_indices #1457

andyleiserson opened this issue Nov 26, 2024 · 0 comments
Labels
performance This affects protocol performance

Comments

@andyleiserson
Copy link
Collaborator

intermediates_to_table_indices works as follows:

  • It calls bits_to_table_indices, which takes three u128s each containing the value of one of three intermediates for 128 multiplications, and returns four u128s containing a table index in each nibble.
  • It then reorders those nibbles into bytes as its output. (Originally, the table lookup was done here, but additional optimization moved the table lookup elsewhere.)

It appears that bits_to_table_indices compiles to <200 instructions (fully unrolled with no loops or branches), while the rearranging of nibbles compiles to >1000 instructions (again, fully unrolled with no loops or branches). Implementing a single transpose-like operation covering both steps would probably be more efficient.

@andyleiserson andyleiserson added the performance This affects protocol performance label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance This affects protocol performance
Projects
None yet
Development

No branches or pull requests

1 participant