Improve float16 performance #2154
Labels
Component - C Library
Core C library issues (usually in the src directory)
Priority - 1. High 🔼
These are important issues that should be resolved in the next release
Type - Improvement
Improvements that don't add a new feature or functionality
Milestone
Using HDF5 to read data stored as 16-bit floating point into a 32-bit buffer is extremely slow, around 16x slower than an equivalent conversion in numpy. I uploaded a demo here. For simplicity I used h5py, but one can obtain the same result using the HDF5 C API. HDF5 also seems to discard any payload bits in NaN values. I suspect the slowdown is due to the very general implementation for custom float types in HDF5 here
hdf5/src/H5Tconv.c
Lines 4267 to 4271 in 306db40
versus the float16-specific handling in numpy.
The case I really care about involves a structured data type (for complex values), which is 44x slower than a numpy workaround. That demo is available here, though I haven't isolated a cause for that extra factor of 3x.
It seems like ideally there'd be a
H5T__conv_half_single
routine that uses hardware to convert from_Float16
(example). I guess this might require adding anative_half
type, which seems like a big job. Or maybe just a special case inH5T__conv_f_f
?The text was updated successfully, but these errors were encountered: