diff --git a/doc/specs/stdlib_intrinsics.md b/doc/specs/stdlib_intrinsics.md new file mode 100644 index 000000000..bba436d7b --- /dev/null +++ b/doc/specs/stdlib_intrinsics.md @@ -0,0 +1,158 @@ +--- +title: intrinsics +--- + +# The `stdlib_intrinsics` module + +[TOC] + +## Introduction + +The `stdlib_intrinsics` module provides replacements for some of the well known intrinsic functions found in Fortran compilers for which either a faster and/or more accurate implementation is found which has also proven of interest to the Fortran community. + + +### `stdlib_sum` function + +#### Description + +The `stdlib_sum` function can replace the intrinsic `sum` for `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential as well as reducing the round-off error. This procedure is recommended when summing large arrays, for repetitive summation of smaller arrays consider the classical `sum`. + +#### Syntax + +`res = ` [[stdlib_intrinsics(module):stdlib_sum(interface)]] ` (x [,mask] )` + +`res = ` [[stdlib_intrinsics(module):stdlib_sum(interface)]] ` (x, dim [,mask] )` + +#### Status + +Experimental + +#### Class + +Pure function. + +#### Argument(s) + +`x`: N-D array of either `real` or `complex` type. This argument is `intent(in)`. + +`dim` (optional): scalar of type `integer` with a value in the range from 1 to n, where n equals the rank of `x`. + +`mask` (optional): N-D array of `logical` values, with the same shape as `x`. This argument is `intent(in)`. + +#### Output value or Result value + +If `dim` is absent, the output is a scalar of the same `type` and `kind` as to that of `x`. Otherwise, an array of rank n-1, where n equals the rank of `x`, and a shape similar to that of `x` with dimension `dim` dropped is returned. + + +### `stdlib_sum_kahan` function + +#### Description + +The `stdlib_sum_kahan` function can replace the intrinsic `sum` for `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential complemented by an `elemental` kernel based on the [kahan summation](https://en.wikipedia.org/wiki/Kahan_summation_algorithm) strategy to reduce the round-off error: + +```fortran +elemental subroutine kahan_kernel_(a,s,c) + type(), intent(in) :: a + type(), intent(inout) :: s + type(), intent(inout) :: c + type() :: t, y + y = a - c + t = s + y + c = (t - s) - y + s = t +end subroutine +``` + +#### Syntax + +`res = ` [[stdlib_intrinsics(module):stdlib_sum_kahan(interface)]] ` (x [,mask] )` + +`res = ` [[stdlib_intrinsics(module):stdlib_sum_kahan(interface)]] ` (x, dim [,mask] )` + +#### Status + +Experimental + +#### Class + +Pure function. + +#### Argument(s) + +`x`: 1D array of either `real` or `complex` type. This argument is `intent(in)`. + +`dim` (optional): scalar of type `integer` with a value in the range from 1 to n, where n equals the rank of `x`. + +`mask` (optional): N-D array of `logical` values, with the same shape as `x`. This argument is `intent(in)`. + +#### Output value or Result value + +If `dim` is absent, the output is a scalar of the same `type` and `kind` as to that of `x`. Otherwise, an array of rank n-1, where n equals the rank of `x`, and a shape similar to that of `x` with dimension `dim` dropped is returned. + +#### Example + +```fortran +{!example/intrinsics/example_sum.f90!} +``` + + +### `stdlib_dot_product` function + +#### Description + +The `stdlib_dot_product` function can replace the intrinsic `dot_product` for 1D `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential as well as reducing the round-off error. This procedure is recommended when crunching large arrays, for repetitive products of smaller arrays consider the classical `dot_product`. + +#### Syntax + +`res = ` [[stdlib_intrinsics(module):stdlib_dot_product(interface)]] ` (x, y)` + +#### Status + +Experimental + +#### Class + +Pure function. + +#### Argument(s) + +`x`: 1D array of either `real` or `complex` type. This argument is `intent(in)`. + +`y`: 1D array of the same type and kind as `x`. This argument is `intent(in)`. + +#### Output value or Result value + +The output is a scalar of `type` and `kind` same as to that of `x` and `y`. + + +### `stdlib_dot_product_kahan` function + +#### Description + +The `stdlib_dot_product_kahan` function can replace the intrinsic `dot_product` for 1D `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential , complemented by the same `elemental` kernel based on the [kahan summation](https://en.wikipedia.org/wiki/Kahan_summation_algorithm) used for `stdlib_sum` to reduce the round-off error. + +#### Syntax + +`res = ` [[stdlib_intrinsics(module):stdlib_dot_product_kahan(interface)]] ` (x, y)` + +#### Status + +Experimental + +#### Class + +Pure function. + +#### Argument(s) + +`x`: 1D array of either `real` or `complex` type. This argument is `intent(in)`. + +`y`: 1D array of the same type and kind as `x`. This argument is `intent(in)`. + +#### Output value or Result value + +The output is a scalar of `type` and `kind` same as to that of `x` and `y`. + +```fortran +{!example/intrinsics/example_dot_product.f90!} +``` \ No newline at end of file diff --git a/example/CMakeLists.txt b/example/CMakeLists.txt index cbef7f075..968a048b0 100644 --- a/example/CMakeLists.txt +++ b/example/CMakeLists.txt @@ -13,6 +13,7 @@ add_subdirectory(constants) add_subdirectory(error) add_subdirectory(hashmaps) add_subdirectory(hash_procedures) +add_subdirectory(intrinsics) add_subdirectory(io) add_subdirectory(linalg) add_subdirectory(logger) diff --git a/example/intrinsics/CMakeLists.txt b/example/intrinsics/CMakeLists.txt new file mode 100644 index 000000000..1645ba8a1 --- /dev/null +++ b/example/intrinsics/CMakeLists.txt @@ -0,0 +1,2 @@ +ADD_EXAMPLE(sum) +ADD_EXAMPLE(dot_product) \ No newline at end of file diff --git a/example/intrinsics/example_dot_product.f90 b/example/intrinsics/example_dot_product.f90 new file mode 100644 index 000000000..fb6e40610 --- /dev/null +++ b/example/intrinsics/example_dot_product.f90 @@ -0,0 +1,18 @@ +program example_dot_product + use stdlib_kinds, only: sp + use stdlib_intrinsics, only: stdlib_dot_product, stdlib_dot_product_kahan + implicit none + + real(sp), allocatable :: x(:), y(:) + real(sp) :: total_prod(3) + + allocate( x(1000), y(1000) ) + call random_number(x) + call random_number(y) + + total_prod(1) = dot_product(x,y) !> compiler intrinsic + total_prod(2) = stdlib_dot_product(x,y) !> chunked summation over inner product + total_prod(3) = stdlib_dot_product_kahan(x,y) !> chunked kahan summation over inner product + print *, total_prod(1:3) + +end program example_dot_product \ No newline at end of file diff --git a/example/intrinsics/example_sum.f90 b/example/intrinsics/example_sum.f90 new file mode 100644 index 000000000..0aec4835c --- /dev/null +++ b/example/intrinsics/example_sum.f90 @@ -0,0 +1,17 @@ +program example_sum + use stdlib_kinds, only: sp + use stdlib_intrinsics, only: stdlib_sum, stdlib_sum_kahan + implicit none + + real(sp), allocatable :: x(:) + real(sp) :: total_sum(3) + + allocate( x(1000) ) + call random_number(x) + + total_sum(1) = sum(x) !> compiler intrinsic + total_sum(2) = stdlib_sum(x) !> chunked summation + total_sum(3) = stdlib_sum_kahan(x)!> chunked kahan summation + print *, total_sum(1:3) + +end program example_sum \ No newline at end of file diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index 75989dbc9..2c03323d0 100644 --- a/src/CMakeLists.txt +++ b/src/CMakeLists.txt @@ -16,6 +16,9 @@ set(fppFiles stdlib_hash_64bit_fnv.fypp stdlib_hash_64bit_pengy.fypp stdlib_hash_64bit_spookyv2.fypp + stdlib_intrinsics_dot_product.fypp + stdlib_intrinsics_sum.fypp + stdlib_intrinsics.fypp stdlib_io.fypp stdlib_io_npy.fypp stdlib_io_npy_load.fypp diff --git a/src/stdlib_constants.fypp b/src/stdlib_constants.fypp index e169dbb3b..61c7d6c90 100644 --- a/src/stdlib_constants.fypp +++ b/src/stdlib_constants.fypp @@ -1,5 +1,8 @@ #:include "common.fypp" #:set KINDS = REAL_KINDS +#:set R_KINDS_TYPES = list(zip(REAL_KINDS, REAL_TYPES, REAL_SUFFIX)) +#:set C_KINDS_TYPES = list(zip(CMPLX_KINDS, CMPLX_TYPES, CMPLX_SUFFIX)) + module stdlib_constants !! Constants !! ([Specification](../page/specs/stdlib_constants.html)) @@ -60,5 +63,13 @@ module stdlib_constants real(dp), parameter, public :: u = ATOMIC_MASS_CONSTANT%value !! Atomic mass constant ! Additional constants if needed + #:for k, t, s in R_KINDS_TYPES + ${t}$, parameter, public :: zero_${s}$ = 0._${k}$ + ${t}$, parameter, public :: one_${s}$ = 1._${k}$ + #:endfor + #:for k, t, s in C_KINDS_TYPES + ${t}$, parameter, public :: zero_${s}$ = (0._${k}$,0._${k}$) + ${t}$, parameter, public :: one_${s}$ = (1._${k}$,0._${k}$) + #:endfor end module stdlib_constants diff --git a/src/stdlib_intrinsics.fypp b/src/stdlib_intrinsics.fypp new file mode 100644 index 000000000..1910dd0ce --- /dev/null +++ b/src/stdlib_intrinsics.fypp @@ -0,0 +1,176 @@ +#:include "common.fypp" +#:set R_KINDS_TYPES = list(zip(REAL_KINDS, REAL_TYPES, REAL_SUFFIX)) +#:set C_KINDS_TYPES = list(zip(CMPLX_KINDS, CMPLX_TYPES, CMPLX_SUFFIX)) +#:set RC_KINDS_TYPES = R_KINDS_TYPES + C_KINDS_TYPES +#:set RANKS = range(2, MAXRANK + 1) + +module stdlib_intrinsics + !!Replacement for certain Fortran intrinsic functions offering either faster and/or more accurate implementations. + !! ([Specification](../page/specs/stdlib_intrinsics.html)) + use stdlib_kinds + implicit none + private + + interface stdlib_sum + !! version: experimental + !! + !!### Summary + !! Sum elements of rank N arrays. + !! ([Specification](../page/specs/stdlib_intrinsics.html#stdlib_sum)) + !! + !!### Description + !! + !! This interface provides standard conforming call for sum of elements of any rank. + !! The 1-D base implementation follows a chunked approach for optimizing performance and increasing accuracy. + !! The `N-D` interfaces calls upon the `(N-1)-D` implementation. + !! Supported data types include `real` and `complex`. + !! + #:for rk, rt, rs in RC_KINDS_TYPES + pure module function stdlib_sum_1d_${rs}$(a) result(s) + ${rt}$, intent(in) :: a(:) + ${rt}$ :: s + end function + pure module function stdlib_sum_1d_${rs}$_mask(a,mask) result(s) + ${rt}$, intent(in) :: a(:) + logical, intent(in) :: mask(:) + ${rt}$ :: s + end function + #:for rank in RANKS + pure module function stdlib_sum_${rank}$d_${rs}$( x, mask ) result( s ) + ${rt}$, intent(in) :: x${ranksuffix(rank)}$ + logical, intent(in), optional :: mask${ranksuffix(rank)}$ + ${rt}$ :: s + end function + pure module function stdlib_sum_${rank}$d_dim_${rs}$( x , dim, mask ) result( s ) + ${rt}$, intent(in) :: x${ranksuffix(rank)}$ + integer, intent(in):: dim + logical, intent(in), optional :: mask${ranksuffix(rank)}$ + ${rt}$ :: s${reduced_shape('x', rank, 'dim')}$ + end function + #:endfor + #:endfor + end interface + public :: stdlib_sum + + interface stdlib_sum_kahan + !! version: experimental + !! + !!### Summary + !! Sum elements of rank N arrays. + !! ([Specification](../page/specs/stdlib_intrinsics.html#stdlib_sum_kahan)) + !! + !!### Description + !! + !! This interface provides standard conforming call for sum of elements of any rank. + !! The 1-D base implementation follows a chunked approach combined with a kahan kernel for optimizing performance and increasing accuracy. + !! The `N-D` interfaces calls upon the `(N-1)-D` implementation. + !! Supported data types include `real` and `complex`. + !! + #:for rk, rt, rs in RC_KINDS_TYPES + pure module function stdlib_sum_kahan_1d_${rs}$(a) result(s) + ${rt}$, intent(in) :: a(:) + ${rt}$ :: s + end function + pure module function stdlib_sum_kahan_1d_${rs}$_mask(a,mask) result(s) + ${rt}$, intent(in) :: a(:) + logical, intent(in) :: mask(:) + ${rt}$ :: s + end function + #:for rank in RANKS + pure module function stdlib_sum_kahan_${rank}$d_${rs}$( x, mask ) result( s ) + ${rt}$, intent(in) :: x${ranksuffix(rank)}$ + logical, intent(in), optional :: mask${ranksuffix(rank)}$ + ${rt}$ :: s + end function + pure module function stdlib_sum_kahan_${rank}$d_dim_${rs}$( x , dim, mask ) result( s ) + ${rt}$, intent(in) :: x${ranksuffix(rank)}$ + integer, intent(in):: dim + logical, intent(in), optional :: mask${ranksuffix(rank)}$ + ${rt}$ :: s${reduced_shape('x', rank, 'dim')}$ + end function + #:endfor + #:endfor + end interface + public :: stdlib_sum_kahan + + interface stdlib_dot_product + !! version: experimental + !! + !!### Summary + !! dot_product of rank 1 arrays. + !! ([Specification](../page/specs/stdlib_intrinsics.html#stdlib_dot_product)) + !! + !!### Description + !! + !! compute the dot_product of rank 1 arrays. + !! The 1-D base implementation follows a chunked approach for optimizing performance and increasing accuracy. + !! Supported data types include `real` and `complex`. + !! + #:for rk, rt, rs in RC_KINDS_TYPES + pure module function stdlib_dot_product_${rs}$(a,b) result(p) + ${rt}$, intent(in) :: a(:) + ${rt}$, intent(in) :: b(:) + ${rt}$ :: p + end function + #:endfor + end interface + public :: stdlib_dot_product + + interface stdlib_dot_product_kahan + !! version: experimental + !! + !!### Summary + !! dot_product of rank 1 arrays. + !! ([Specification](../page/specs/stdlib_intrinsics.html#stdlib_dot_product_kahan)) + !! + !!### Description + !! + !! compute the dot_product of rank 1 arrays. + !! The implementation follows a chunked approach combined with a kahan kernel for optimizing performance and increasing accuracy. + !! Supported data types include `real` and `complex`. + !! + #:for rk, rt, rs in RC_KINDS_TYPES + pure module function stdlib_dot_product_kahan_${rs}$(a,b) result(p) + ${rt}$, intent(in) :: a(:) + ${rt}$, intent(in) :: b(:) + ${rt}$ :: p + end function + #:endfor + end interface + public :: stdlib_dot_product_kahan + + interface kahan_kernel + #:for rk, rt, rs in RC_KINDS_TYPES + module procedure :: kahan_kernel_${rs}$ + module procedure :: kahan_kernel_m_${rs}$ + #:endfor + end interface + public :: kahan_kernel + +contains + +#:for rk, rt, rs in RC_KINDS_TYPES +elemental subroutine kahan_kernel_${rs}$(a,s,c) + ${rt}$, intent(in) :: a + ${rt}$, intent(inout) :: s + ${rt}$, intent(inout) :: c + ${rt}$ :: t, y + y = a - c + t = s + y + c = (t - s) - y + s = t +end subroutine +elemental subroutine kahan_kernel_m_${rs}$(a,s,c,m) + ${rt}$, intent(in) :: a + ${rt}$, intent(inout) :: s + ${rt}$, intent(inout) :: c + logical, intent(in) :: m + ${rt}$ :: t, y + y = a - c + t = s + y + c = (t - s) - y + s = merge( s , t , m ) +end subroutine +#:endfor + +end module stdlib_intrinsics diff --git a/src/stdlib_intrinsics_dot_product.fypp b/src/stdlib_intrinsics_dot_product.fypp new file mode 100644 index 000000000..5120c290a --- /dev/null +++ b/src/stdlib_intrinsics_dot_product.fypp @@ -0,0 +1,74 @@ +#:include "common.fypp" +#:set R_KINDS_TYPES = list(zip(REAL_KINDS, REAL_TYPES, REAL_SUFFIX)) +#:set C_KINDS_TYPES = list(zip(CMPLX_KINDS, CMPLX_TYPES, CMPLX_SUFFIX)) +#:set RC_KINDS_TYPES = R_KINDS_TYPES + C_KINDS_TYPES + +#:def cnjg(type,expression) +#:if 'complex' in type +conjg(${expression}$) +#:else +${expression}$ +#:endif +#:enddef + +submodule(stdlib_intrinsics) stdlib_intrinsics_dot_product + !!Replacement for certain Fortran intrinsic functions offering either faster and/or more accurate implementations. + !! ([Specification](../page/specs/stdlib_intrinsics.html)) + use stdlib_kinds + use stdlib_constants + implicit none + + integer, parameter :: chunk = 64 + +contains +! This implementation is based on https://github.com/jalvesz/fast_math +#:for k1, t1, s1 in RC_KINDS_TYPES +pure module function stdlib_dot_product_${s1}$(a,b) result(p) + ${t1}$, intent(in) :: a(:) + ${t1}$, intent(in) :: b(:) + ${t1}$ :: p + ${t1}$ :: abatch(chunk) + integer :: i, dr, rr + ! ----------------------------- + dr = size(a)/chunk + rr = size(a) - dr*chunk + + abatch = zero_${s1}$ + do i = 1, dr + abatch(1:chunk) = abatch(1:chunk) + a(chunk*i-chunk+1:chunk*i)*${cnjg(t1,'b(chunk*i-chunk+1:chunk*i)')}$ + end do + abatch(1:rr) = abatch(1:rr) + a(size(a)-rr+1:size(a))*${cnjg(t1,'b(size(a)-rr+1:size(a))')}$ + + p = zero_${s1}$ + do i = 1, chunk/2 + p = p + abatch(i)+abatch(chunk/2+i) + end do +end function +#:endfor + +#:for k1, t1, s1 in RC_KINDS_TYPES +pure module function stdlib_dot_product_kahan_${s1}$(a,b) result(p) + ${t1}$, intent(in) :: a(:) + ${t1}$, intent(in) :: b(:) + ${t1}$ :: p + ${t1}$ :: pbatch(chunk) + ${t1}$ :: cbatch(chunk) + integer :: i, dr, rr + ! ----------------------------- + dr = size(a)/(chunk) + rr = size(a) - dr*chunk + pbatch = zero_${s1}$ + cbatch = zero_${s1}$ + do i = 1, dr + call kahan_kernel( a(chunk*i-chunk+1:chunk*i)*${cnjg(t1,'b(chunk*i-chunk+1:chunk*i)')}$ , pbatch(1:chunk) , cbatch(1:chunk) ) + end do + call kahan_kernel( a(size(a)-rr+1:size(a))*${cnjg(t1,'b(size(a)-rr+1:size(a))')}$ , pbatch(1:rr) , cbatch(1:rr) ) + + p = zero_${s1}$ + do i = 1,chunk + call kahan_kernel( pbatch(i) , p , cbatch(i) ) + end do +end function +#:endfor + +end submodule stdlib_intrinsics_dot_product diff --git a/src/stdlib_intrinsics_sum.fypp b/src/stdlib_intrinsics_sum.fypp new file mode 100644 index 000000000..84b449608 --- /dev/null +++ b/src/stdlib_intrinsics_sum.fypp @@ -0,0 +1,260 @@ +#:include "common.fypp" +#:set R_KINDS_TYPES = list(zip(REAL_KINDS, REAL_TYPES, REAL_SUFFIX)) +#:set C_KINDS_TYPES = list(zip(CMPLX_KINDS, CMPLX_TYPES, CMPLX_SUFFIX)) +#:set RC_KINDS_TYPES = R_KINDS_TYPES + C_KINDS_TYPES +#:set RANKS = range(2, MAXRANK + 1) + +submodule(stdlib_intrinsics) stdlib_intrinsics_sum + !! ([Specification](../page/specs/stdlib_intrinsics.html)) + use stdlib_kinds + use stdlib_constants + implicit none + +contains + +!================= 1D Base implementations ============ +! This implementation is based on https://github.com/jalvesz/fast_math +#:for rk, rt, rs in RC_KINDS_TYPES +pure module function stdlib_sum_1d_${rs}$(a) result(s) + integer, parameter :: chunk = 64 + ${rt}$, intent(in) :: a(:) + ${rt}$ :: s + ${rt}$ :: abatch(chunk) + integer :: i, dr, rr + ! ----------------------------- + dr = size(a)/chunk + rr = size(a) - dr*chunk + + abatch = zero_${rs}$ + do i = 1, dr + abatch(1:chunk) = abatch(1:chunk) + a(chunk*i-chunk+1:chunk*i) + end do + abatch(1:rr) = abatch(1:rr) + a(size(a)-rr+1:size(a)) + + s = zero_${rs}$ + do i = 1, chunk/2 + s = s + abatch(i)+abatch(chunk/2+i) + end do +end function + +pure module function stdlib_sum_1d_${rs}$_mask(a,mask) result(s) + integer, parameter :: chunk = 64 + ${rt}$, intent(in) :: a(:) + logical, intent(in) :: mask(:) + ${rt}$ :: s + ${rt}$ :: abatch(chunk) + integer :: i, dr, rr + ! ----------------------------- + dr = size(a)/chunk + rr = size(a) - dr*chunk + + abatch = zero_${rs}$ + do i = 1, dr + abatch(1:chunk) = abatch(1:chunk) + merge( zero_${rs}$ , a(chunk*i-chunk+1:chunk*i) , mask(chunk*i-chunk+1:chunk*i) ) + end do + abatch(1:rr) = abatch(1:rr) + merge( zero_${rs}$ , a(size(a)-rr+1:size(a)) , mask(size(a)-rr+1:size(a)) ) + + s = zero_${rs}$ + do i = 1, chunk/2 + s = s + abatch(i)+abatch(chunk/2+i) + end do +end function + +pure module function stdlib_sum_kahan_1d_${rs}$(a) result(s) + integer, parameter :: chunk = 64 + ${rt}$, intent(in) :: a(:) + ${rt}$ :: s + ${rt}$ :: sbatch(chunk) + ${rt}$ :: cbatch(chunk) + integer :: i, dr, rr + ! ----------------------------- + dr = size(a)/chunk + rr = size(a) - dr*chunk + sbatch = zero_${rs}$ + cbatch = zero_${rs}$ + do i = 1, dr + call kahan_kernel( a(chunk*i-chunk+1:chunk*i) , sbatch(1:chunk) , cbatch(1:chunk) ) + end do + call kahan_kernel( a(size(a)-rr+1:size(a)) , sbatch(1:rr) , cbatch(1:rr) ) + + s = zero_${rs}$ + do i = 1,chunk + call kahan_kernel( sbatch(i) , s , cbatch(i) ) + end do +end function + +pure module function stdlib_sum_kahan_1d_${rs}$_mask(a,mask) result(s) + integer, parameter :: chunk = 64 + ${rt}$, intent(in) :: a(:) + logical, intent(in) :: mask(:) + ${rt}$ :: s + ${rt}$ :: sbatch(chunk) + ${rt}$ :: cbatch(chunk) + integer :: i, dr, rr + ! ----------------------------- + dr = size(a)/chunk + rr = size(a) - dr*chunk + sbatch = zero_${rs}$ + cbatch = zero_${rs}$ + do i = 1, dr + call kahan_kernel( a(chunk*i-chunk+1:chunk*i) , sbatch(1:chunk) , cbatch(1:chunk) , mask(chunk*i-chunk+1:chunk*i) ) + end do + call kahan_kernel( a(size(a)-rr+1:size(a)) , sbatch(1:rr) , cbatch(1:rr) , mask(size(a)-rr+1:size(a)) ) + + s = zero_${rs}$ + do i = 1,chunk + call kahan_kernel( sbatch(i) , s , cbatch(i) ) + end do +end function +#:endfor + +!================= N-D implementations ============ +#:for rk, rt, rs in RC_KINDS_TYPES +#:for rank in RANKS +pure module function stdlib_sum_${rank}$d_${rs}$( x , mask ) result( s ) + ${rt}$, intent(in) :: x${ranksuffix(rank)}$ + logical, intent(in), optional :: mask${ranksuffix(rank)}$ + ${rt}$ :: s + if(.not.present(mask)) then + s = sum_recast(x,size(x)) + else + s = sum_recast_mask(x,mask,size(x)) + end if +contains + pure ${rt}$ function sum_recast(b,n) + integer, intent(in) :: n + ${rt}$, intent(in) :: b(n) + sum_recast = stdlib_sum(b) + end function + pure ${rt}$ function sum_recast_mask(b,m,n) + integer, intent(in) :: n + ${rt}$, intent(in) :: b(n) + logical, intent(in) :: m(n) + sum_recast_mask = stdlib_sum(b,m) + end function +end function + +pure module function stdlib_sum_${rank}$d_dim_${rs}$( x , dim, mask ) result( s ) + ${rt}$, intent(in) :: x${ranksuffix(rank)}$ + integer, intent(in):: dim + logical, intent(in), optional :: mask${ranksuffix(rank)}$ + ${rt}$ :: s${reduced_shape('x', rank, 'dim')}$ + integer :: j + + if(.not.present(mask)) then + if(dim<${rank}$)then + do j = 1, size(x,dim=${rank}$) + #:if rank == 2 + s${select_subarray(rank-1, [(rank-1, 'j')])}$ = stdlib_sum( x${select_subarray(rank, [(rank, 'j')])}$ ) + #:else + s${select_subarray(rank-1, [(rank-1, 'j')])}$ = stdlib_sum( x${select_subarray(rank, [(rank, 'j')])}$, dim=dim ) + #:endif + end do + else + do j = 1, size(x,dim=1) + #:if rank == 2 + s${select_subarray(rank-1, [(1, 'j')])}$ = stdlib_sum( x${select_subarray(rank, [(1, 'j')])}$ ) + #:else + s${select_subarray(rank-1, [(1, 'j')])}$ = stdlib_sum( x${select_subarray(rank, [(1, 'j')])}$, dim=${rank-1}$ ) + #:endif + end do + end if + else + if(dim<${rank}$)then + do j = 1, size(x,dim=${rank}$) + #:if rank == 2 + s${select_subarray(rank-1, [(rank-1, 'j')])}$ = stdlib_sum( x${select_subarray(rank, [(rank, 'j')])}$, mask=mask${select_subarray(rank, [(rank, 'j')])}$ ) + #:else + s${select_subarray(rank-1, [(rank-1, 'j')])}$ = stdlib_sum( x${select_subarray(rank, [(rank, 'j')])}$, dim=dim, mask=mask${select_subarray(rank, [(rank, 'j')])}$ ) + #:endif + end do + else + do j = 1, size(x,dim=1) + #:if rank == 2 + s${select_subarray(rank-1, [(1, 'j')])}$ = stdlib_sum( x${select_subarray(rank, [(1, 'j')])}$, mask=mask${select_subarray(rank, [(1, 'j')])}$ ) + #:else + s${select_subarray(rank-1, [(1, 'j')])}$ = stdlib_sum( x${select_subarray(rank, [(1, 'j')])}$, dim=${rank-1}$, mask=mask${select_subarray(rank, [(1, 'j')])}$ ) + #:endif + end do + end if + end if + +end function +#:endfor +#:endfor + +#:for rk, rt, rs in RC_KINDS_TYPES +#:for rank in RANKS +pure module function stdlib_sum_kahan_${rank}$d_${rs}$( x , mask ) result( s ) + ${rt}$, intent(in) :: x${ranksuffix(rank)}$ + logical, intent(in), optional :: mask${ranksuffix(rank)}$ + ${rt}$ :: s + if(.not.present(mask)) then + s = sum_recast(x,size(x)) + else + s = sum_recast_mask(x,mask,size(x)) + end if +contains + pure ${rt}$ function sum_recast(b,n) + integer, intent(in) :: n + ${rt}$, intent(in) :: b(n) + sum_recast = stdlib_sum_kahan(b) + end function + pure ${rt}$ function sum_recast_mask(b,m,n) + integer, intent(in) :: n + ${rt}$, intent(in) :: b(n) + logical, intent(in) :: m(n) + sum_recast_mask = stdlib_sum_kahan(b,m) + end function +end function + +pure module function stdlib_sum_kahan_${rank}$d_dim_${rs}$( x , dim, mask ) result( s ) + ${rt}$, intent(in) :: x${ranksuffix(rank)}$ + integer, intent(in):: dim + logical, intent(in), optional :: mask${ranksuffix(rank)}$ + ${rt}$ :: s${reduced_shape('x', rank, 'dim')}$ + integer :: j + + if(.not.present(mask)) then + if(dim<${rank}$)then + do j = 1, size(x,dim=${rank}$) + #:if rank == 2 + s${select_subarray(rank-1, [(rank-1, 'j')])}$ = stdlib_sum_kahan( x${select_subarray(rank, [(rank, 'j')])}$ ) + #:else + s${select_subarray(rank-1, [(rank-1, 'j')])}$ = stdlib_sum_kahan( x${select_subarray(rank, [(rank, 'j')])}$, dim=dim ) + #:endif + end do + else + do j = 1, size(x,dim=1) + #:if rank == 2 + s${select_subarray(rank-1, [(1, 'j')])}$ = stdlib_sum_kahan( x${select_subarray(rank, [(1, 'j')])}$ ) + #:else + s${select_subarray(rank-1, [(1, 'j')])}$ = stdlib_sum_kahan( x${select_subarray(rank, [(1, 'j')])}$, dim=${rank-1}$ ) + #:endif + end do + end if + else + if(dim<${rank}$)then + do j = 1, size(x,dim=${rank}$) + #:if rank == 2 + s${select_subarray(rank-1, [(rank-1, 'j')])}$ = stdlib_sum_kahan( x${select_subarray(rank, [(rank, 'j')])}$, mask=mask${select_subarray(rank, [(rank, 'j')])}$ ) + #:else + s${select_subarray(rank-1, [(rank-1, 'j')])}$ = stdlib_sum_kahan( x${select_subarray(rank, [(rank, 'j')])}$, dim=dim, mask=mask${select_subarray(rank, [(rank, 'j')])}$ ) + #:endif + end do + else + do j = 1, size(x,dim=1) + #:if rank == 2 + s${select_subarray(rank-1, [(1, 'j')])}$ = stdlib_sum_kahan( x${select_subarray(rank, [(1, 'j')])}$, mask=mask${select_subarray(rank, [(1, 'j')])}$ ) + #:else + s${select_subarray(rank-1, [(1, 'j')])}$ = stdlib_sum_kahan( x${select_subarray(rank, [(1, 'j')])}$, dim=${rank-1}$, mask=mask${select_subarray(rank, [(1, 'j')])}$ ) + #:endif + end do + end if + end if + +end function +#:endfor +#:endfor + +end submodule stdlib_intrinsics_sum diff --git a/test/CMakeLists.txt b/test/CMakeLists.txt index 4d83548db..566c0e6d8 100644 --- a/test/CMakeLists.txt +++ b/test/CMakeLists.txt @@ -17,6 +17,7 @@ add_subdirectory(constants) add_subdirectory(hash_functions) add_subdirectory(hash_functions_perf) add_subdirectory(hashmaps) +add_subdirectory(intrinsics) add_subdirectory(io) add_subdirectory(linalg) add_subdirectory(logger) diff --git a/test/intrinsics/CMakeLists.txt b/test/intrinsics/CMakeLists.txt new file mode 100644 index 000000000..cc5e9fb87 --- /dev/null +++ b/test/intrinsics/CMakeLists.txt @@ -0,0 +1,7 @@ +set( + fppFiles + "test_intrinsics.fypp" +) +fypp_f90("${fyppFlags}" "${fppFiles}" outFiles) + +ADDTEST(intrinsics) diff --git a/test/intrinsics/test_intrinsics.fypp b/test/intrinsics/test_intrinsics.fypp new file mode 100644 index 000000000..7ef7eddc4 --- /dev/null +++ b/test/intrinsics/test_intrinsics.fypp @@ -0,0 +1,239 @@ +#:include "common.fypp" +#:set R_KINDS_TYPES = list(zip(REAL_KINDS, REAL_TYPES, REAL_SUFFIX)) +#:set C_KINDS_TYPES = list(zip(CMPLX_KINDS, CMPLX_TYPES, CMPLX_SUFFIX)) +#:set RC_KINDS_TYPES = R_KINDS_TYPES + C_KINDS_TYPES + +module test_intrinsics + use testdrive, only : new_unittest, unittest_type, error_type, check, skip_test + use stdlib_kinds, only: sp, dp, xdp, qp, int8, int16, int32, int64 + use stdlib_intrinsics + use stdlib_math, only: swap + implicit none + +contains + +!> Collect all exported unit tests +subroutine collect_suite(testsuite) + !> Collection of tests + type(unittest_type), allocatable, intent(out) :: testsuite(:) + + testsuite = [ & + new_unittest('sum', test_sum), & + new_unittest('dot_product', test_dot_product) & + ] +end subroutine + +subroutine test_sum(error) + !> Error handling + type(error_type), allocatable, intent(out) :: error + + !> Internal parameters and variables + integer, parameter :: n = 1e3, ncalc = 3 + real(sp) :: u + integer :: iter, i, j + !==================================================================================== + #:for k1, t1, s1 in R_KINDS_TYPES + block + ${t1}$, allocatable :: x(:) + ${t1}$, parameter :: total_sum = 4*atan(1._${k1}$), tolerance = epsilon(1._${k1}$)*100 + ${t1}$ :: xsum(ncalc), err(ncalc) + logical, allocatable :: mask(:), nmask(:) + + allocate(x(n)) + do i = 1, n + x(i) = 8*atan(1._${k1}$)*(real(i,kind=${k1}$)-0.5_${k1}$)/real(n,kind=${k1}$)**2 + end do + allocate(mask(n),source=.false.); mask(1:n:2) = .true. + allocate(nmask(n)); nmask = .not.mask + ! scramble array + do i = 1, n + call random_number(u) + j = 1 + floor(n*u) + call swap( x(i), x(j) ) + call swap( mask(i), mask(j) ) + call swap( nmask(i), nmask(j) ) + end do + + xsum(1) = sum(x) ! compiler intrinsic + xsum(2) = stdlib_sum_kahan(x) ! chunked Kahan summation + xsum(3) = stdlib_sum(x) ! chunked summation + err(1:ncalc) = abs(1._${k1}$-xsum(1:ncalc)/total_sum) + + call check(error, all(err(:) sum all elements + call check(error, abs( sum(x) - stdlib_sum(x) ) sum over specific rank dim + do i = 1, rank(x) + call check(error, norm2( sum(x,dim=i) - stdlib_sum(x,dim=i) ) Error handling + type(error_type), allocatable, intent(out) :: error + + !> Internal parameters and variables + integer, parameter :: n = 1e3, ncalc = 3 + real(sp) :: u + integer :: iter, i, j + !==================================================================================== + #:for k1, t1, s1 in R_KINDS_TYPES + block + ${t1}$, allocatable :: x(:) + ${t1}$, parameter :: total_sum = 4*atan(1._${k1}$), tolerance = epsilon(1._${k1}$)*100 + ${t1}$ :: xsum(ncalc), err(ncalc) + + allocate(x(n)) + do i = 1, n + x(i) = 2*sqrt( 2*atan(1._${k1}$)*(real(i,kind=${k1}$)-0.5_${k1}$) )/n + end do + ! scramble array + do i = 1, n + call random_number(u) + j = 1 + floor(n*u) + call swap( x(i), x(j) ) + end do + + xsum(1) = dot_product(x,x) ! compiler intrinsic + xsum(2) = stdlib_dot_product_kahan(x,x) ! chunked Kahan summation + xsum(3) = stdlib_dot_product(x,x) ! chunked summation + err(1:ncalc) = abs(1._${k1}$-xsum(1:ncalc)/total_sum) + + call check(error, all(err(:) 0) then + write(error_unit, '(i0, 1x, a)') stat, "test(s) failed!" + error stop + end if +end program \ No newline at end of file