Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Vector Improvements #113

Draft
wants to merge 8 commits into
base: partial_eval
Choose a base branch
from
Draft

WIP: Vector Improvements #113

wants to merge 8 commits into from

Conversation

ncough
Copy link
Collaborator

@ncough ncough commented Nov 27, 2024

A bunch of changes I have been meaning to push up, cleaning up the vectoriser to deal with almost everything under aarch64_vectors*. Placeholder for me, as it really needs to be broken up into a series of PRs, when I get around to it.

General bug fixes:

  • Fix take and drop list utilities.
  • Fix CSE to not consider expressions over mutable variables as candidates for elimination. The introduction of the CSE variable may not be correctly placed with respect to any mutation of these variables.
  • Fix dead code removal to not remove impure function calls, as they may mutate global state.
  • Improve type checking for post partial evaluation to handle arrays and global variables.
  • At some point sdiv_int worked its way into the code base, but it doesn't exist.

Improved coverage testing of vectors:

  • Initialise the _Z array with random values to get some results out of coverage testing for vectors instructions.
  • New fp operations FPRSqrtEstimate and UnsignedRSqrtEstimate are now reachable, add them to the inlining config.
  • Extend the interpreter to support an integer bit slice update. Haven't added this to the partial evaluator.
  • Add an implementation of sqrt_real, not sure if it's accurate to hardware though.

Change our primitive shift operations to line up with SMTLIB:

  • Shifts now take an unsigned bitvector to represent the shift amount, and both the shift and value are of the same width.
  • Support a primitive for rotate operations, as requested by Improving Rotation Translation #83
  • This changes the signature for shifts, so downstream uses have to be updated.

Changes to simplify and improve the vectoriser:

  • Move all vectoriser related logic to a new file, libASL/loops.ml.
  • Break out the induction variable substitution into a pre-pass that is applied before IntToBits, so that it can reason over trivial integer operations.
  • Handle vectorisation with a series of primitive classes and common logic to transform each class.
  • Move vector expression cleanup to a post-phase.
  • Keep select_vec and update_vec operations in terms of integer positional arguments until the last possible phase, to simplify reasoning.
  • Building on these simplifications, add a series of vectorised floating point operations and new reductions.
  • This changes the signature for some vector operations, so downstream uses have to be updated.

Remaining:

  • Init the entire _Z array with random values, rather than just the low 64-bits.
  • Find out what sqrt_real should be.
  • Add integer bit slice to partial evaluator (Maybe? Doesn't seem to come up outside of fp prims).
  • Add missing floating point operations to override.asl.

Reduce flag setting pattern in CCMP instruction, from:

if cond then
  X1 := E1
  ...
  XN := EN
else
  X1 := F1
  ...
  XN := FN

to:

const ITESimp_0 := cond
X1 := (ITESimp_0 & E1) | (!ITESimp_0 & F1)
...
XN := (ITESimp_0 & EN) | (!ITESimp_0 & FN)
* Extend coverage testing to init vector state
* Add two new fp ops
* Support slice write to integer
* Don't consider an impure operation dead
* Don't CSE mutable variables, as location of CSE var is fixed to top
* Fix defs for take and drop
* Extend to perform expression evaluation
* Support types on globals
* Support types on arrays
* Change shifts to take unsigned shift amounts
* Change shifts to take two args of the same width
* Add ROR operation, ror_bits
* Remove mistaken use of 'sdiv_int'
* Add x * 0 = 0 reduction
* Pull inductive variable reasoning out of vectorizer,
  into pre-pass instead
* Pull vector expression cleanup and lowering into sub-passes
* Simplify vectorizer to focus on correctness and simple
  pattern matching
* Add more fp prims
* Support general append patterns
* Support popcount
* Fix unsigned min/max
* Support bit set pattern
Original evaluator did not implement real sqrt.
Not entirely clear how this should be done, but
an implementation is needed for coverage to work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant