OCML is an LLVM-IR bitcode library designed to relieve language compiler and runtime implementers of the burden of implementing efficient and accurate mathematical functions. It is essentially a “libm” in intermediate representation with a fixed, simple API that can be linked in to supply the implementations of most standard low-level mathematical functions provided by the language.
OCML is expected to be used in a standard LLVM compilation flow as follows:
- Compile source modules to LLVM-IR bitcode (clang)
- Link program bitcode, “wrapper” bitcode, OCML bitcode, and OCML control functions (llvm-link)
- Generic optimizations (opt)
- Code generation (llc)
Here, “wrapper” bitcode denotes a thin library responsible for mapping mangled built-in function calls as produced by clang to the OCML API. An example in C might look like
inline float sqrt(float x) { return __ocml_sqrt_f32(x); }
The next section describes OCML controls and how to make them.
OCML supports a number of controls that are provided by linking in specifically named inline functions. These functions are inlined at optimization time and result in specific paths taken with no control flow overhead. These functions all have the form (in C)
__attribute__((always_inline, const)) int
__oclc_control(void)
{ return 1; } // or 0 to disable
The currently supported control are
finite_only_opt
- floating point Inf and NaN are never expected to be consumed or producedunsafe_math_opt
- lower accuracy results may be produced with higher performancedaz_opt
- subnormal values consumed and produced may be flushed to zerocorrectly_rounded_sqrt32
- float square root must be correctly roundedISA_version
- an integer representation of the ISA version of the target device
OCML ships as a single LLVM-IR bitcode file named
ocml-{LLVM rev}-{OCLM rev}.bc
where {LLVM rev}
is the version of LLVM used to create the file, of the
form X.Y, e.g. 3.8, and {OCML rev}
is the OCML library version of the form X.Y, currently 0.9.
Some OCML functions require access to tables of constants. These tables are currently named
with the prefix __ocmltbl_
and are placed in LLVM address space 2.
OCML functions follow a simple naming convention:
__ocml_{function}_{type suffix}
where {function} is generally the familiar libm name of the function, and {type suffix} indicates the type of the floating point arguments or results, and is one of
f16
– 16 bit floating point (half precision)f32
– 32 bit floating point (single precision)f64
– 64 bit floating point (double precision)
For example, __ocml_sqrt_f32
is the name of the OCML single precision square root function.
OCML does not currently support higher than double precision due to the lack of support on most devices.
The following table contains a list of {function} currently supported by OCML, a brief description of each, and the maximum relative error in ULPs for each floating point type. A “c” in the last 3 columns indicates that the function is required to be correctly rounded.
{function} | Description | f32 max err | f64 max err | f16 max err |
---|---|---|---|---|
acos | arc cosine | 4 | 4 | 2 |
acosh | arc hyperbolic cosine | 4 | 4 | 2 |
acospi | arc cosine / π | 5 | 5 | 2 |
add_{rm} | add with specific rounding mode | c | c | c |
asin | arc sine | 4 | 4 | 2 |
asinh | arc hyperbolic sin | 4 | 4 | 2 |
asinpi | arc sine / pi | 5 | 5 | 2 |
atan2 | two argument arc tangent | 6 | 6 | 2 |
atan2pi | two argument arc tangent / pi | 6 | 6 | 2 |
atan | single argument arc tangent | 5 | 5 | 2 |
atanh | arc hyperbolic tangent | 5 | 5 | 2 |
atanpi | single argument arc tangent / pi | 5 | 5 | 2 |
cbrt | cube root | 2 | 2 | 2 |
ceil | round upwards to integer | c | c | c |
copysign | copy sign of second argument to absolute value of first | 0 | 0 | 0 |
cos | cosine | 4 | 4 | 2 |
cosh | hyperbolic cosine | 4 | 4 | 2 |
cospi | cosine of argument times pi | 4 | 4 | 2 |
div_{rm} | correctly rounded division with specific rounding mode | c | c | c |
erf | error function | 16 | 16 | 4 |
erfc | complementary error function | 16 | 16 | 4 |
erfcinv | inverse complementary error function | 7 | 8 | 3 |
erfcx | scaled error function | 6 | 6 | 2 |
erfinv | inverse error function | 3 | 8 | 2 |
exp10 | 10x | 3 | 3 | 2 |
exp2 | 2x | 3 | 3 | 2 |
exp | ex | 3 | 3 | 2 |
expm1 | ex - 1, accurate at 0 | 3 | 3 | 2 |
fabs | absolute value | 0 | 0 | 0 |
fdim | positive difference | c | c | c |
floor | round downwards to integer | c | c | c |
fma[_{rm}] | fused (i.e. singly rounded) multiply-add, with optional specific rounding | c | c | c |
fmax | maximum, avoids NaN | 0 | 0 | 0 |
fmin | minimum, avoids NaN | 0 | 0 | 0 |
fmod | floating point remainder | 0 | 0 | 0 |
fpclassify | classify floating point | - | - | - |
fract | fractional part | c | c | c |
frexp | extract significand and exponent | 0 | 0 | 0 |
hypot | length, with overflow control | 4 | 4 | 2 |
i0 | modified Bessel function of the first kind, order 0, I0 | 6 | 6 | 2 |
i1 | modified Bessel function of the first kind, order 1, I1 | 6 | 6 | 2 |
ilogb | extract exponent | 0 | 0 | 0 |
isfinite | tests finiteness | - | - | - |
isinf | test for Inf | - | - | - |
isnan | test for NaN | - | - | - |
isnormal | test for normal | - | - | - |
j0 | Bessel function of the first kind, order 0, J0 | 6 (<12) | 6 (<12) | 2 (<12) |
j1 | Bessel function of the first kind, order 1, J1 | 6 (<12) | 6 (<12) | 2 (<12) |
ldexp | multiply by 2 raised to an integral power | c | c | c |
len3 | three argument hypot | 2 | 2 | 2 |
len4 | four argument hypot | 2 | 2 | 2 |
lgamma | log Γ function | 6(>0) | 4(>0) | 3(>0) |
lgamma_r | log Γ function with sign | 6(>0) | 4(>0) | 3(>0) |
log10 | log base 10 | 3 | 3 | 2 |
log1p | log base e accurate near 1 | 2 | 2 | 2 |
log2 | log base 2 | 3 | 3 | 2 |
log | log base e | 3 | 3 | 2 |
logb | extract exponent | 0 | 0 | 0 |
mad | multiply-add, implementation defined if fused | c | c | c |
max | maximum without special NaN handling | 0 | 0 | 0 |
maxmag | maximum magnitude | 0 | 0 | 0 |
min | minimum without special NaN handling | 0 | 0 | 0 |
minmag | minimum magnitude | 0 | 0 | 0 |
modf | extract integer and fraction | 0 | 0 | 0 |
mul_{rm} | multiply with specific rounding mode | c | c | c |
nan | produce a NaN with a specific payload | 0 | 0 | 0 |
ncdf | standard normal cumulateive distribution function | 16 | 16 | 4 |
ncdfinv | inverse standard normal cumulative distribution function | 16 | 16 | 4 |
nearbyint | round to nearest integer (see also rint) | 0 | 0 | 0 |
nextafter | next closest value above or below | 0 | 0 | 0 |
pow | general power | 16 | 16 | 4 |
pown | power with integral exponent | 16 | 16 | 4 |
powr | power with positive floating point exponent | 16 | 16 | 4 |
rcbrt | reciprocal cube root | 2 | 2 | 2 |
remainder | floating point remainder | 0 | 0 | 0 |
remquo | floating point remainder and lowest integral quotient bits | 0 | 0 | 0 |
rhypot | reciprocal hypot | 2 | 2 | 2 |
rint | round to nearest integer | c | c | c |
rlen3 | reciprocal len3 | 2 | 2 | 2 |
rlen4 | reciprocal len4 | 2 | 2 | 2 |
rootn | nth root | 16 | 16 | 4 |
round | round to integer, always away from 0 | c | c | c |
rsqrt | reciprocal square root | 2 | 2 | 1 |
scalb | multiply by 2 raised to a power | c | c | c |
scalbn | multiply by 2 raised to an integral power (see also ldexp) | c | c | c |
signbit | nonzero if argument has sign bit set | - | - | - |
sin | sine function | 4 | 4 | 2 |
sincos | simultaneous sine and cosine evaluation | 4 | 4 | 2 |
sincospi | sincos function of argument times pi | 4 | 4 | 2 |
sinh | hyperbolic sin | 4 | 4 | 2 |
sinpi | sine of argument times pi | 4 | 4 | 2 |
sqrt | square root | 3/c | 3/c | c |
sub_{rm} | subtract with specific rounding mode | c | c | c |
tan | tangent | 5 | 5 | 2 |
tanh | hyperbolic tangent | 5 | 5 | 2 |
tanpi | tangent of argument times pi | 6 | 6 | 2 |
tgamma | true Γ function | 16 | 16 | 4 |
trunc | round to integer, towards zero | c | c | c |
y0 | Bessel function of the second kind, order 0, Y0 | 2 (<12) | 6 (<12) | 6 (<12) |
y1 | Bessel function of the second kind, order 1, Y1 | 2 (<12) | 6 (<12) | 6 (<12) |
For the functions supporting specific roundings, the rounding mode {rm} can be one of
rte
– round towards nearest evenrtp
– round towards positive infinityrtn
– round towards negative infinityrtz
– round towards zero