You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Looking at the API, I was wondering if there were plans to expand the API, in order to try to optimize performance, especially:
Are there plans to set the number of threads via the API, or is this set anyway via the environment variable MKL_NUM_THREADS?
Would it be possible to add an argument out, so that the user can pass a pre-allocated numpy array to store the results? (I would expect this to lead to better performance in some cases, since there would be no allocation of memory when calling fft in this case.)
Are there plans to add an alternative lower-level Python API (much like the C API), where users can create and commit a descriptor, and then call the DFTI routines (potentially many times), before finally destroying the descriptor. (Again, I would expect this to lead to better performance ; is that correct?)
The text was updated successfully, but these errors were encountered:
Regarding your last point. Yes, it is totally correct. Creating a descriptor can be as costly for 1D FFT as performing computations. mkl_fft attempts to cache 1D descriptors, so that repeated calls to FFT on arrays of the same size has better performance.
Thanks again for making
mkl_fft
available!Looking at the API, I was wondering if there were plans to expand the API, in order to try to optimize performance, especially:
Are there plans to set the number of threads via the API, or is this set anyway via the environment variable
MKL_NUM_THREADS
?Would it be possible to add an argument
out
, so that the user can pass a pre-allocated numpy array to store the results? (I would expect this to lead to better performance in some cases, since there would be no allocation of memory when callingfft
in this case.)Are there plans to add an alternative lower-level Python API (much like the C API), where users can create and commit a descriptor, and then call the DFTI routines (potentially many times), before finally destroying the descriptor. (Again, I would expect this to lead to better performance ; is that correct?)
The text was updated successfully, but these errors were encountered: