intrinsics module with alternative implementations #915

jalvesz · 2025-01-03T20:54:23Z

Add intrinsics module containing replacements for intrinsic function where some feature is found interesting: faster implementation, better accuracy, both simultaneously.

This PR follows the discussion in discourse https://fortran-lang.discourse.group/t/lfortran-now-supports-all-intrinsic-functions/8844/41 and it's based on https://github.com/jalvesz/fast_math

sum: 2 options (stdlib_sum and stdlib_sum_kahan)
dot_product: 2 options (stdlib_dot_product and stdlib_dot_product_kahan)

cc: @fortran-lang/stdlib @perazz @certik @jvdp1

…ntrinsics

jalvesz · 2025-01-07T19:35:12Z

One philosophical question: should the fsum interface be renamed to sum to enable direct replacement of the intrinsic? Keep this name? Or yet something like stdlib_sum? (Same for fprod->dot_product)

Regarding the kahan versions, given that the accuracy gains are close between the pure chunked version and the kahan one, I'm wondering which level of support should be enabled to switch between them?

…ntrinsics

perazz · 2025-01-30T17:04:39Z

IMHO shorter names are better, and don't see a problem if they overlap with the intrinsics. First, because one can always pick the right version:

use stdlib_intrinsics, only: dot_product

vs.

! Force using intrinsic
intrinsic :: dot_product

And then because they can be augmented by more/different arguments

c = dot_product(a,b) ! intrinsic
c = dot_product(a,b,mode='kahan') ! stdlib
c = dot_product(a,b,mode='blocked') ! stdlib
...

I find this more elegant and definitely not confusing.
This PR also reminds me that it would be worthwhile to also augment the matmul intrinsic via calls to the gemm backend

jvdp1

Thank you @jalvesz. LGTM. It seems to be close to be ready for mergin.

jvdp1 · 2025-02-02T14:25:26Z

doc/specs/stdlib_intrinsics.md

+
+#### Description
+
+The `stdlib_sum` function can replace the intrinsic `sum` for `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential as well as reducing the round-off error. This procedure is recommended when summing large arrays, for repetitive summation of smaller arrays consider the classical `sum`.


Why is it not for integer?

No specific reason, when implementing the first version my first need was for reals and so that's what I proposed here. I can test if it also brings benefits for integers and extend the template.

jvdp1 · 2025-02-02T20:28:10Z

doc/specs/stdlib_intrinsics.md

+
+#### Description
+
+The `stdlib_dot_product_kahan` function can replace the intrinsic `dot_product` for 1D `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential , complemented by the same `elemental` kernel based on the [kahan summation](https://en.wikipedia.org/wiki/Kahan_summation_algorithm) used for `stdlib_sum` to reduce the round-off error.


is the license of wikipedia in agreement with the MIT license of stdlib?

I did not take content directly from Wikipedia, I just cited the wiki page that summarizes the kahan summation algorithm. Would such citation be problematic?

Ok. Most content of WIkipedia is under CC BY-SA4.0, which states: "Share Alike—If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original."
To avoid any potential confusions/issues, maybe could you reference to the original paper: https://doi.org/10.1145%2F363707.363723 ?

doc/specs/stdlib_intrinsics.md

src/stdlib_intrinsics.fypp

jvdp1 · 2025-02-03T20:46:36Z

src/stdlib_intrinsics.fypp

+        !! This interface provides standard conforming call for sum of elements of any rank.
+        !! The 1-D base implementation follows a chunked approach for optimizing performance and increasing accuracy.
+        !! The `N-D` interfaces calls upon the `(N-1)-D` implementation. 
+        !! Supported data types include `real` and `complex`.


Why are integers not supported?

test/intrinsics/test_intrinsics.fypp

jvdp1 · 2025-02-03T21:00:19Z

IMHO shorter names are better, and don't see a problem if they overlap with the intrinsics. First, because one can always pick the right version:
use stdlib_intrinsics, only: dot_product
vs.
! Force using intrinsic
intrinsic :: dot_product

I prefer to keep stdlib_sum and stdlib_dot_product as it is current in this PR. This will allow the user to use both implementations, and to use stdlib implementation if desired as followed:

use stdlib_intrinsics, only: dot_product => stdlib_dot_product

With this approach, the user will not inadvertently use the stdlib implementation.

And then because they can be augmented by more/different arguments
c = dot_product(a,b) ! intrinsic
c = dot_product(a,b,mode='kahan') ! stdlib
c = dot_product(a,b,mode='blocked') ! stdlib
...

This approach would break backward compatibility with the intrinsics. IMO I prefer the previous approach (either an overlap, or a name with a prefix stdlib_).

perazz · 2025-02-04T07:11:24Z

I prefer to keep stdlib_sum and stdlib_dot_product

LGTM @jvdp1 @jalvesz!

Co-authored-by: Jeremie Vandenplas <[email protected]>

jalvesz and others added 17 commits December 24, 2024 13:12

intrinsics module with fast sums

08ec0aa

Merge branch 'fortran-lang:master' into intrinsics

c36251e

Merge branch 'fortran-lang:master' into intrinsics

2207f41

add fast dot_product and start tests

2bc7af9

Merge branch 'intrinsics' of https://github.com/jalvesz/stdlib into i…

4625205

…ntrinsics

add complex sum test

243ea6f

test masked sum

c38dcd6

add dot_product tests

bf1ce2f

start specs

cc9df61

Merge branch 'fortran-lang:master' into intrinsics

671fd61

split into submodules

75945f1

specs and examples

d05903f

Merge branch 'intrinsics' of https://github.com/jalvesz/stdlib into i…

c0d96e5

…ntrinsics

Merge branch 'fortran-lang:master' into intrinsics

4abd8d3

fix specs

7c6e8a4

fix test: complex initialization

7cea1fd

fix test: complex assignment caused accuracy loss

eaffa4a

jalvesz changed the title ~~feate: intrinsics module with alternative implementations~~ intrinsics module with alternative implementations Jan 4, 2025

jalvesz and others added 6 commits January 5, 2025 16:56

Merge branch 'fortran-lang:master' into intrinsics

ad64162

extend fsum support for ndarrays

a3d24e4

remove unnecessary definition

5a1fdcb

update specs, change name of kahan kernel

47396ac

small reorganization

ecb7050

Merge branch 'intrinsics' of https://github.com/jalvesz/stdlib into i…

87ef502

…ntrinsics

jalvesz added 2 commits January 11, 2025 23:53

change names to stdlib_*

14be974

add comments

aaa68bc

jalvesz marked this pull request as ready for review January 12, 2025 10:32

jalvesz and others added 2 commits January 17, 2025 17:26

Merge branch 'fortran-lang:master' into intrinsics

cc232e1

extend kahan sum for rank N arrays

6e36b6f

jalvesz and others added 3 commits January 17, 2025 19:56

Merge branch 'intrinsics' of https://github.com/jalvesz/stdlib into i…

65175d7

…ntrinsics

Merge branch 'fortran-lang:master' into intrinsics

8a35f38

Merge branch 'fortran-lang:master' into intrinsics

16a0e96

jvdp1 reviewed Feb 3, 2025

View reviewed changes

jalvesz and others added 6 commits February 7, 2025 19:35

Update src/stdlib_intrinsics.fypp

f0ed271

Co-authored-by: Jeremie Vandenplas <[email protected]>

Update test/intrinsics/test_intrinsics.fypp

316269b

Co-authored-by: Jeremie Vandenplas <[email protected]>

Update test/intrinsics/test_intrinsics.fypp

52aab02

Co-authored-by: Jeremie Vandenplas <[email protected]>

fix test allocation

a6be0a0

nmask allocation

3e171f7

revert nmask allocation

332b748

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

intrinsics module with alternative implementations #915

intrinsics module with alternative implementations #915

jalvesz commented Jan 3, 2025 •

edited

Loading

jalvesz commented Jan 7, 2025

perazz commented Jan 30, 2025 •

edited

Loading

jvdp1 left a comment

jvdp1 Feb 2, 2025

jalvesz Feb 7, 2025

jvdp1 Feb 2, 2025

jalvesz Feb 7, 2025 •

edited

Loading

jvdp1 Feb 8, 2025

jvdp1 Feb 3, 2025

jvdp1 commented Feb 3, 2025

perazz commented Feb 4, 2025


		#### Description

		The `stdlib_sum` function can replace the intrinsic `sum` for `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential as well as reducing the round-off error. This procedure is recommended when summing large arrays, for repetitive summation of smaller arrays consider the classical `sum`.


		#### Description

		The `stdlib_dot_product_kahan` function can replace the intrinsic `dot_product` for 1D `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential , complemented by the same `elemental` kernel based on the [kahan summation](https://en.wikipedia.org/wiki/Kahan_summation_algorithm) used for `stdlib_sum` to reduce the round-off error.

intrinsics module with alternative implementations #915

Are you sure you want to change the base?

intrinsics module with alternative implementations #915

Conversation

jalvesz commented Jan 3, 2025 • edited Loading

jalvesz commented Jan 7, 2025

perazz commented Jan 30, 2025 • edited Loading

jvdp1 left a comment

Choose a reason for hiding this comment

jvdp1 Feb 2, 2025

Choose a reason for hiding this comment

jalvesz Feb 7, 2025

Choose a reason for hiding this comment

jvdp1 Feb 2, 2025

Choose a reason for hiding this comment

jalvesz Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

jvdp1 Feb 8, 2025

Choose a reason for hiding this comment

jvdp1 Feb 3, 2025

Choose a reason for hiding this comment

jvdp1 commented Feb 3, 2025

perazz commented Feb 4, 2025

jalvesz commented Jan 3, 2025 •

edited

Loading

perazz commented Jan 30, 2025 •

edited

Loading

jalvesz Feb 7, 2025 •

edited

Loading