-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add support for
i8
dtype, add --raw_accumulators
flag, add `--tar…
…get=host_cpu` for easy local testing. (#22) A few unrelated things mixed in this PR, but they are separate commits if you'd prefer me to slice it into 3 PRs. 1. Add a `--raw_accumulators` flag that drops the truncation of the results (default False). This leads to lower arithmetic intensity (because the result values are larger) and either higher or lower performance. This is less representative of real workloads, but is sometimes easier to reason about as a microbenchmark. 2. Add support for `i8` dtype accumulating into `i32`. For now only added to the `square` problem set. Also added `bf16` to that set. 3. Add a special value for the existing `--target` flag: `"host_cpu"` for testing on CPU configured for the host. This was mostly for my own use to be able to develop these changes locally without a GPU. --------- Signed-off-by: Benoit Jacob <[email protected]>
- Loading branch information
Showing
3 changed files
with
277 additions
and
211 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.