-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Take advantage of contiguous arrays #6
Comments
Cython supportCython supports this by using fused types, with minimal code duplication. See the example here: https://pythonspeed.com/articles/faster-cython-simd/ It might be useful to document this pattern in the Cython documentation, at minimum. And it could in theory be added as language feature so as to minimize boilerplate. |
Rust supportThe most commonly used crate for arrays is |
In many cases it will also be fine to only support contiguous arrays, and make a copy first when getting non-contiguous arrays (possibly in Python code, before passing it to a function in an extension module). This is a common patterns when using Pythran. The end result is usually better performance on the common case, while still supporting the non-common case. |
I'm a little wary of copying as a solution. High memory usage can have a significant impact on computation costs (RAM isn't cheap), and there's the risk of hitting the swapping performance cliff. And it's already super-easy to end up with way-too-high memory usage with explicit APIs. So adding intermittent, hidden copying of large arrays seems like a bad idea in generic library APIs, at least. In the context of applications rather than libraries, where the author has better understanding of inputs and run time environment, it might be a good solution though. |
NumPy views can point at non-contiguous chunks of memory. This means general purpose code needs to be able to accept both contiguous and non-contiguous memory, which means generic code that accepts NumPy arrays will have to assume non-contiguous memory. And this loses out on potential optimizations, in particular automatic usage of SIMD; if the compiler knows the array is contiguous, it can skip a bunch of stride computations and do more optimization.
Contiguous inputs are going to be very common; how common depends on the domain and function. So it would be good to get maximum speed for those.
That means compiling two versions of expensive functions, one for contiguous arrays and one for non-contiguous arrays, and choosing the appropriate one based on inputs. And as a library author I would like to do this with minimum code duplication!
Numba does this automatically, but for most languages this requires changes to the code.
The text was updated successfully, but these errors were encountered: