Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: SIMD and other goodies #534

Draft
wants to merge 89 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
e6cd2eb
Vector indexing operations and empty vector constructor
HugoPeters1024 Oct 19, 2021
0c80d44
created empty vector lifted in Exp
HugoPeters1024 Oct 20, 2021
2c90dd5
Add implementation of empty vector and indexing
HugoPeters1024 Oct 26, 2021
74feaec
Add bounds check on vector index
HugoPeters1024 Oct 27, 2021
977669b
Fix vector creation (todo delete the prim const)
HugoPeters1024 Oct 28, 2021
dc7d849
implement interpreter and fix bugs
HugoPeters1024 Nov 2, 2021
3fe1e80
Remove vector create constant
HugoPeters1024 Nov 3, 2021
faa139b
add missing pattern match and module in cabal file
HugoPeters1024 Nov 4, 2021
0e250b8
Move vec operations to correct AST
HugoPeters1024 Dec 2, 2021
21d6dab
fix off by one errors
HugoPeters1024 Dec 8, 2021
ad1f995
style changes
HugoPeters1024 Dec 13, 2021
f9556e3
prevent memcpy using unsafe mutable coercion
HugoPeters1024 Jan 19, 2022
d957227
Merge branch 'vector-operations' of https://github.com/HugoPeters1024…
tmcdonell Mar 10, 2022
1eaa378
stack: update resolver
tmcdonell Mar 10, 2022
837c20c
add operations for 128-bit floating point numbers
tmcdonell Jun 7, 2022
dff0ba7
add basic types for single bits, bit vectors
tmcdonell Jun 7, 2022
69c36a0
fix to/fromList for BitMask
tmcdonell Jun 10, 2022
ac1aa94
add support for computation on SIMD types
tmcdonell Jun 11, 2022
30fa47e
stack/9.2: drop non-default flags
tmcdonell Jun 13, 2022
361582f
stack/8.10: update resolver
tmcdonell Jun 13, 2022
cdbeb0a
stack/9.0: update resolver
tmcdonell Jun 13, 2022
9c892b1
build fixes
tmcdonell Jun 13, 2022
0b769c6
doctest fixes
tmcdonell Jun 13, 2022
eacdced
add pattern synonyms for shape constructors
tmcdonell Jun 13, 2022
9d61a01
update CHANGELOG.md
tmcdonell Jun 13, 2022
007e3cd
test fixes
tmcdonell Jun 14, 2022
4b2c1d7
add COMPLETE pragmas
tmcdonell Jun 14, 2022
f390343
fix doctests
tmcdonell Jun 14, 2022
11829ff
drop ghc-8.6 as it crashes the compiler
tmcdonell Jun 15, 2022
7b614fc
fix pattern matching for bit tags
tmcdonell Jun 18, 2022
6a3234b
clean up constructor/enum tags
tmcdonell Jun 20, 2022
0b10e22
add vectorised min, max
tmcdonell Jun 26, 2022
ba107c8
value/expression polymorphic pattern synonyms for Vec
tmcdonell Jun 27, 2022
0c2b6c2
value/expression polymorphic pattern synonyms for tuples
tmcdonell Jun 27, 2022
eb62a0d
improve type checking for tuple patterns
tmcdonell Jun 28, 2022
68f7de7
more polymorphic pattern synonyms
tmcdonell Jun 28, 2022
7d4b276
fix doctests
tmcdonell Jun 28, 2022
d90ee51
build fixes
tmcdonell Jun 28, 2022
53019c8
build fix
tmcdonell Jun 30, 2022
1d16894
warning police
tmcdonell Jun 30, 2022
c1a32fe
warning police
tmcdonell Jun 30, 2022
f5482d8
build fix
tmcdonell Jun 30, 2022
63a0bb7
be a bit smarter
tmcdonell Aug 31, 2022
13c835c
add Integral instances for vector types
tmcdonell Aug 31, 2022
8ea3e51
copy-pasta error
tmcdonell Aug 31, 2022
0b51562
drop unused file
tmcdonell Aug 31, 2022
fdac92f
vectorise type of smart constructors for logical operations
tmcdonell Aug 31, 2022
31b8fba
NOTE
tmcdonell Aug 31, 2022
1d1eacf
vectorised RealFloat
tmcdonell Sep 20, 2022
6f3376e
updates for vectorised RealFloat
tmcdonell Sep 21, 2022
00819a2
pack BitMask densely rather than being byte-aligned
tmcdonell Sep 29, 2022
4d3bd0c
nofib build fix
tmcdonell Sep 29, 2022
9700467
export strict (&&!) and (||!)
tmcdonell Oct 1, 2022
86fd7fa
export 128-bit types
tmcdonell Oct 1, 2022
9fe3290
export operators on SIMD vectors
tmcdonell Oct 1, 2022
2591910
add conversion between bool and integral types
tmcdonell Oct 3, 2022
6e8897e
export vector splat
tmcdonell Oct 3, 2022
3ddaed9
fix FromBool
tmcdonell Oct 3, 2022
8a09b11
add vand, vor operators
tmcdonell Oct 3, 2022
eb77713
export vnot, &&*, ||*
tmcdonell Oct 3, 2022
fe8a0c8
fix Ord instance for Vec
tmcdonell Oct 3, 2022
4bebdae
arbitrary width signed and unsigned integers
tmcdonell May 16, 2023
b70d5cc
actually unsafe coerce
tmcdonell May 16, 2023
d50e6de
rename Coerce to Bitcast
tmcdonell May 16, 2023
6830362
use associated data family
tmcdonell May 16, 2023
72ef2df
unused imports
tmcdonell May 16, 2023
4db6544
fix vand & vor for non-power-of-two vecs
tmcdonell May 16, 2023
c3fd071
Merge branch 'master' into wip/type-hierarchy
tmcdonell May 16, 2023
c4d95c6
Merge branch 'master' into wip/type-hierarchy
tmcdonell Jul 24, 2023
34f05d4
ci: don't run haddock on windows
tmcdonell Aug 14, 2023
ab6b6fa
add cc-options -std=c11
tmcdonell Aug 14, 2023
0dcd2e3
updates for ghc-9.6
tmcdonell Aug 14, 2023
789d744
Haddock documentation not handled by Template Haskell until GHC-9
tmcdonell Aug 14, 2023
8fa7494
OPTIONS_HADDOCK hide
tmcdonell Aug 14, 2023
165f317
warning police
tmcdonell Aug 18, 2023
1e07380
cleaning up smart constructor cruft
tmcdonell Aug 21, 2023
d11f98c
add more [SIMD vector] primops
tmcdonell Aug 21, 2023
8fbf661
copy pasta error
tmcdonell Aug 28, 2023
673cc77
show instance for TupR
tmcdonell Aug 28, 2023
a9f062e
coerce arrays between different types
tmcdonell Aug 28, 2023
ac5ff61
minor cleanup
tmcdonell Aug 29, 2023
2c4008f
build fix for ghc < 9.4
tmcdonell Aug 30, 2023
73c2e98
export acoerceOp
tmcdonell Sep 9, 2023
671a782
what was I thinking?
tmcdonell Sep 12, 2023
03a0b60
update acoerceOp
tmcdonell Sep 28, 2023
934a12f
fix undef size computation
tmcdonell Sep 28, 2023
cefdcec
export pack, unpack
tmcdonell Sep 28, 2023
308fed4
embedding polymorphic Complex constructor
tmcdonell Sep 28, 2023
2784ec6
embedding polymorphic containers from Monoid & Semigroup
tmcdonell Oct 3, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ jobs:
- name: Haddock
# Behaviour of cabal haddock has changed for the worse: https://github.com/haskell/cabal/issues/8725
run: cabal haddock --disable-documentation
if: matrix.mode == 'release'
if: matrix.os != 'windows-latest' && matrix.mode == 'release'

- name: Test doctest
run: cabal test doctest
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,5 @@
/docs/_build
*.hi
*.o

hie.yaml
17 changes: 17 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,29 @@ Policy (PVP)](https://pvp.haskell.org)
## [next]
### Added
* Added debugging functions in module `Data.Array.Accelerate.Debug.Trace` ([#485](https://github.com/AccelerateHS/accelerate/pull/485))
* Support for SIMD data types in expressions. Support for storing a type `a`
in a SIMD vector can be added by deriving an instance for the class `SIMD`.
Pattern synonyms `V2`, `V3`, `V4`, `V8` and `V16` are provided to work with
these at both the Haskell value and embedded expression level.
* Instances for SIMD types in basic numeric classes (e.g. `Num` for `<4 x Float>`)
* Support for 128-bit integers (signed and unsigned)
* Support for 128-bit floating point types (build with cabal flag `float128`)

### Changed
* Removed dependency on lens ([#493](https://github.com/AccelerateHS/accelerate/pull/493))
* The shape constructors (e.g. `Z` and `(:.)`) are now pattern synonyms that
work on both Haskell values and embedded expressions Similarly for the
constructors of `Maybe`, `Either`, `Bool`, and `Ordering`.

### Fixed
* Graphviz graph generation of `-ddump-dot` and `-ddump-simpl-dot` ([#384](https://github.com/AccelerateHS/accelerate/issues/384))
* Bug in `Semigroup` instance for `Maybe` ([#517](https://github.com/AccelerateHS/accelerate/issues/517))
* Bug in `Ord` instances or tuple types

### Removed
* Pattern synonyms `Z_`, `(::.)`, `Any_`, `All_`, which are no longer required
* Pattern synonyms `Just_`, `Nothing_` etc., which have been renamed to no
longer require the trailing underscore.

### Contributors

Expand Down
48 changes: 42 additions & 6 deletions accelerate.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,15 @@ custom-setup
, directory >= 1.0
, filepath >= 1.0

flag float128
manual: True
default: False
description:
Enable support for 128-bit floating point numbers
.
This requires the library 'quadmath' to be installed. Note that not all
targets support 128-bit floating-point numbers.

flag debug
manual: True
default: False
Expand Down Expand Up @@ -364,6 +373,11 @@ library
, unique
, unordered-containers >= 0.2
, vector >= 0.10
, wide-word >= 0.1

if impl(ghc < 9.0)
build-depends:
integer-gmp

exposed-modules:
-- The core language and reference implementation
Expand Down Expand Up @@ -392,14 +406,15 @@ library
Data.Array.Accelerate.Analysis.Hash
Data.Array.Accelerate.Analysis.Match
Data.Array.Accelerate.Array.Data
Data.Array.Accelerate.Array.Remote
Data.Array.Accelerate.Array.Remote.Class
Data.Array.Accelerate.Array.Remote.LRU
Data.Array.Accelerate.Array.Remote.Table
-- Data.Array.Accelerate.Array.Remote
-- Data.Array.Accelerate.Array.Remote.Class
-- Data.Array.Accelerate.Array.Remote.LRU
-- Data.Array.Accelerate.Array.Remote.Table
Data.Array.Accelerate.Array.Unique
Data.Array.Accelerate.Async
Data.Array.Accelerate.Error
Data.Array.Accelerate.Debug.Internal
Data.Array.Accelerate.Error
Data.Array.Accelerate.Interpreter.Arithmetic
Data.Array.Accelerate.Lifetime
Data.Array.Accelerate.Pretty
Data.Array.Accelerate.Representation.Array
Expand Down Expand Up @@ -433,9 +448,11 @@ library
Data.Array.Accelerate.Test.Similar

-- Other
Crypto.Hash.XKCP
Data.BitSet
Data.Primitive.Bit
Data.Primitive.Vec
Crypto.Hash.XKCP
Data.Numeric.Float128

other-modules:
Data.Array.Accelerate.Analysis.Hash.TH
Expand All @@ -445,6 +462,7 @@ library
Data.Array.Accelerate.Classes.Eq
Data.Array.Accelerate.Classes.Floating
Data.Array.Accelerate.Classes.Fractional
Data.Array.Accelerate.Classes.FromBool
Data.Array.Accelerate.Classes.FromIntegral
Data.Array.Accelerate.Classes.Integral
Data.Array.Accelerate.Classes.Num
Expand All @@ -454,6 +472,9 @@ library
Data.Array.Accelerate.Classes.RealFloat
Data.Array.Accelerate.Classes.RealFrac
Data.Array.Accelerate.Classes.ToFloating
Data.Array.Accelerate.Classes.VEq
Data.Array.Accelerate.Classes.VNum
Data.Array.Accelerate.Classes.VOrd
Data.Array.Accelerate.Debug.Internal.Clock
Data.Array.Accelerate.Debug.Internal.Flags
Data.Array.Accelerate.Debug.Internal.Graph
Expand All @@ -470,7 +491,10 @@ library
Data.Array.Accelerate.Pattern.Either
Data.Array.Accelerate.Pattern.Maybe
Data.Array.Accelerate.Pattern.Ordering
Data.Array.Accelerate.Pattern.SIMD
Data.Array.Accelerate.Pattern.Shape
Data.Array.Accelerate.Pattern.TH
Data.Array.Accelerate.Pattern.Tuple
Data.Array.Accelerate.Prelude
Data.Array.Accelerate.Pretty.Graphviz
Data.Array.Accelerate.Pretty.Graphviz.Monad
Expand All @@ -489,6 +513,7 @@ library
Data.Array.Accelerate.Test.NoFib.Config

Language.Haskell.TH.Extra
GHC.TypeLits.Extra

if flag(nofib)
build-depends:
Expand Down Expand Up @@ -562,6 +587,7 @@ library
cc-options:
-O3
-Wall
-std=c11

cxx-options:
-O3
Expand Down Expand Up @@ -590,6 +616,16 @@ library
-caf-all
-auto-all

if flag(float128)
cc-options:
-DFLOAT128_ENABLE

cpp-options:
-DFLOAT128_ENABLE

extra-libraries:
quadmath

if flag(debug)
cc-options:
-DACCELERATE_DEBUG
Expand Down
119 changes: 119 additions & 0 deletions cbits/float128.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@

#include <quadmath.h>
#include <stdio.h>

typedef _Float128 f128;

union ieee754_quad {
f128 as_float128;
struct {
#if WORDS_BIGENDIAN
uint64_t negative:1;
uint64_t exponent:15;
uint64_t mantissa0:48;
uint64_t mantissa1;
#else
uint64_t mantissa1;
uint64_t mantissa0:48;
uint64_t exponent:15;
uint64_t negative:1;
#endif
} as_uint128;
};

/* Operations from Read and Show
*/
void _readq(f128* r, const char* str) { *r = strtoflt128(str, NULL); }
void _showq(char* buf, size_t n, f128 *a) { quadmath_snprintf(buf, n, "%Qf", *a); }

/* Operations from Num
*/
void _addq(f128* r, const f128* a, const f128* b) { *r = *a + *b; }
void _subq(f128* r, const f128* a, const f128* b) { *r = *a - *b; }
void _mulq(f128* r, const f128* a, const f128* b) { *r = *a * *b; }
void _negateq(f128* r, const f128* a) { *r = - *a; }
void _absq(f128* r, const f128* a) { *r = fabsq(*a); }
void _signumq(f128* r, const f128* a) { *r = (*a > 0.0q) - (*a < 0.0q); }

/* Operations from Fractional
*/
void _divq(f128* r, const f128* a, const f128* b) { *r = *a / *b; }
void _recipq(f128* r, const f128* a) { *r = 1.0q / *a; }

/* Operations from Floating
*/
void _piq(f128* r) { *r = M_PIq; }
void _expq(f128* r, const f128* a) { *r = expq(*a); }
void _logq(f128* r, const f128* a) { *r = logq(*a); }
void _sqrtq(f128* r, const f128* a) { *r = sqrtq(*a); }
void _powq(f128* r, const f128* a, const f128* b) { *r = powq(*a, *b); }
void _sinq(f128* r, const f128* a) { *r = sinq(*a); }
void _cosq(f128* r, const f128* a) { *r = cosq(*a); }
void _tanq(f128* r, const f128* a) { *r = tanq(*a); }
void _asinq(f128* r, const f128* a) { *r = asinq(*a); }
void _acosq(f128* r, const f128* a) { *r = acosq(*a); }
void _atanq(f128* r, const f128* a) { *r = atanq(*a); }
void _sinhq(f128* r, const f128* a) { *r = sinhq(*a); }
void _coshq(f128* r, const f128* a) { *r = coshq(*a); }
void _tanhq(f128* r, const f128* a) { *r = tanhq(*a); }
void _asinhq(f128* r, const f128* a) { *r = asinhq(*a); }
void _acoshq(f128* r, const f128* a) { *r = acoshq(*a); }
void _atanhq(f128* r, const f128* a) { *r = atanhq(*a); }
void _log1pq(f128* r, const f128* a) { *r = log1pq(*a); }
void _expm1q(f128* r, const f128* a) { *r = expm1q(*a); }

/* Operations from RealFrac
*/
void _roundq(f128* r, const f128* a) { *r = roundq(*a); }
void _truncq(f128* r, const f128* a) { *r = truncq(*a); }
void _floorq(f128* r, const f128* a) { *r = floorq(*a); }
void _ceilq(f128* r, const f128* a) { *r = ceilq(*a); }

/* Operations from RealFloat
*/
uint32_t _isnanq(const f128* a) { return isnanq(*a); }
uint32_t _isinfq(const f128* a) { return isinfq(*a); }
void _frexpq(f128* r, const f128* a, int32_t* b) { *r = frexpq(*a, b); }
void _ldexpq(f128* r, const f128* a, int32_t b) { *r = ldexpq(*a, b); }
void _atan2q(f128* r, const f128* a, const f128* b) { *r = atan2q(*a, *b); }

/* A (single/double/quad) precision floating point number is denormalized iff:
* - exponent is zero
* - mantissa is non-zero
* - (don't care about the sign bit)
*/
uint32_t _isdenormq(const f128* a)
{
union ieee754_quad u;
u.as_float128 = *a;

return (u.as_uint128.exponent == 0
&& (u.as_uint128.mantissa0 != 0 || u.as_uint128.mantissa1 != 0));
}

/* A (single/double/quad) precision floating point number is negative zero iff:
* - sign bit is set
* - all other bits are zero
*/
uint32_t _isnegzeroq(const f128* a)
{
union ieee754_quad u;
u.as_float128 = *a;

return (
u.as_uint128.negative &&
u.as_uint128.exponent == 0 &&
u.as_uint128.mantissa0 == 0 &&
u.as_uint128.mantissa1 == 0
);
}

/* Operations from Ord
*/
uint32_t _ltq(const f128* a, const f128* b) { return *a < *b; }
uint32_t _leq(const f128* a, const f128* b) { return *a <= *b; }
uint32_t _gtq(const f128* a, const f128* b) { return *a > *b; }
uint32_t _geq(const f128* a, const f128* b) { return *a <= *b; }
void _fminq(f128* r, const f128* a, const f128* b) { *r = fminq(*a, *b); }
void _fmaxq(f128* r, const f128* a, const f128* b) { *r = fmaxq(*a, *b); }

Loading