Implement argminmax #697

tylera-nvidia · 2024-08-06T21:20:47Z

Implement a new argminmax function based on cub_reduce that calculates the following:

an index of the data's minimum
an index of the data's maximum
the data's max
the data's min

example call would be:

(matx::mtie(minVal, minIdx, maxVal, maxIdx) = matx::argminmax(inFlattened)).run();

…urrently does many bad things

…gs compile, but have not tested

include/matx/generators/random.h

cliffburdick · 2024-08-09T03:39:07Z

@tylera-nvidia this will need a docs page too

tmartin-gh · 2024-08-10T22:18:03Z

include/matx/transforms/cub.h

+
+    // Set up CUB stuff
+    auto d_in = thrust::counting_iterator<size_t>(0);
+    auto min_op = minmaxer{in.Data()};


Passing the .Data() pointer will cause this to fail if the underlying tensor has been permuted. Ideally we'd rework RandomOperatorIterator and RandomOperatorOutputIterator to work correctly with Thrust.

At a minimum, suggest updating the docs (limitations.rst) to indicate this doesn't support permutations.

tmartin-gh · 2024-08-10T22:19:59Z

include/matx/transforms/cub.h

+ *   CUDA stream
+ */
+template <typename OutType, typename TensorIndexType, typename InType>
+void cub_reduce_custom(OutType minDest, TensorIndexType &minIdxs, OutType maxDest, TensorIndexType &maxIdxs, const InType &in, const cudaStream_t stream = 0)


Should this be 'cub_reduce_argminmax' instead of 'cub_reduce_custom' since it doesn't accept an arbitrary custom compare?

Or perhaps allow it to accept an arbitrary custom compare, but make this a 'cub_reduce_custom_4output' since it's outputting 4 values?

tmartin-gh · 2024-08-10T22:22:33Z

include/matx/transforms/cub.h

+ * is an existing reduce() implementation as part of matx_reduce.h, but this function
+ * exists in cases where CUB is more optimal/faster.
+ *
+ * @tparam OutputTensor


Doxygen comments don't match code (OutputTensor vs OutType, etc).

tmartin-gh · 2024-08-10T22:23:15Z

include/matx/transforms/cub.h

+   *
+   * @note Views being passed must be in row-major order
+   *
+   * @tparam OutputTensor


Doxygen comments don't match code

cliffburdick · 2024-10-24T15:35:02Z

@tylera-nvidia can we close this one given #778 ?

cliffburdick · 2024-11-06T16:56:51Z

@tylera-nvidia I saw you just committed another patch, but is this OBE given #778?

tylera-nvidia · 2024-11-06T21:34:33Z

@cliffburdick I was doing some performance comparison with @tmartin-gh to make sure things were roughly equivalent. I can into some performance issues when the return tensor is size {1} versus size {0}. runtime goes from ~40us to >1ms, but we still get the right answer. My implementation did not exhibit that behavior, but it looks like it was due to my "dumb" pointer arithmetic in writing out the output.

After resolving that, performance seems roughly comparable, so I'm going to close this PR. It looks like caching is currently broken in main, and we have some unprotected conditions for weird outputs, but those should all be fixed with new branches off main, not based on my old implementation.

tylera-nvidia added 5 commits July 26, 2024 21:58

inital development and moving cub based argminmax to matx function. c…

c07a4e2

…urrently does many bad things

adding support for cub cache plans

0496f31

changes all cub caches to use new format. code changes are made, thin…

1969207

…gs compile, but have not tested

Merge remote-tracking branch 'origin/cubPlanUpdates' into argminmax_cub

484d8cf

adding custom copy for data out of argminmax results

991db63

tylera-nvidia added the enhancement New feature or request label Aug 6, 2024

tylera-nvidia self-assigned this Aug 6, 2024

tylera-nvidia linked an issue Aug 6, 2024 that may be closed by this pull request

[FEA] add argminmax function #691

Closed

cliffburdick reviewed Aug 7, 2024

View reviewed changes

include/matx/generators/random.h Outdated Show resolved Hide resolved

tmartin-gh reviewed Aug 10, 2024

View reviewed changes

tylera-nvidia added 4 commits August 13, 2024 22:20

adding select implementation

3cb7467

MMTB

30cf734

restoring lost code from bad merge

e4c40ff

Merge remote-tracking branch 'origin/main' into argminmax_cub

c7035ab

fix bad dereference:

863a925

tylera-nvidia closed this Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement argminmax #697

Implement argminmax #697

tylera-nvidia commented Aug 6, 2024

cliffburdick commented Aug 9, 2024

tmartin-gh Aug 10, 2024

tmartin-gh Aug 10, 2024

tmartin-gh Aug 10, 2024

tmartin-gh Aug 10, 2024

tmartin-gh Aug 10, 2024

cliffburdick commented Oct 24, 2024

cliffburdick commented Nov 6, 2024

tylera-nvidia commented Nov 6, 2024

Implement argminmax #697

Implement argminmax #697

Conversation

tylera-nvidia commented Aug 6, 2024

cliffburdick commented Aug 9, 2024

tmartin-gh Aug 10, 2024

Choose a reason for hiding this comment

tmartin-gh Aug 10, 2024

Choose a reason for hiding this comment

tmartin-gh Aug 10, 2024

Choose a reason for hiding this comment

tmartin-gh Aug 10, 2024

Choose a reason for hiding this comment

tmartin-gh Aug 10, 2024

Choose a reason for hiding this comment

cliffburdick commented Oct 24, 2024

cliffburdick commented Nov 6, 2024

tylera-nvidia commented Nov 6, 2024