-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement dpnp.bitwise_count
#2308
base: master
Are you sure you want to change the base?
Conversation
View rendered docs @ https://intelpython.github.io/dpnp/pull/2308/index.html |
Array API standard conformance tests for dpnp=0.17.0dev6=py312he4f9c94_22 ran successfully. |
// constant value, if constant | ||
// constexpr resT constant_value = resT{}; | ||
// is function defined for sycl::vec | ||
using supports_vec = typename std::false_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to have a vector implementation, since sycl::popcount
support that?
And we have options here: it can be supported either only for int8_t where no casting is needed or for all integer types through explicit vector casting, like dpctl does:
template <int vec_sz>
sycl::vec<resT, vec_sz> operator()(const sycl::vec<argT, vec_sz> &x) const
{
if constexpr (std::is_unsigned_v<argT>) {
auto const &res_vec = sycl::popcount(x);
using deducedT = typename std::remove_cv_t<
std::remove_reference_t<decltype(res_vec)>>::element_type;
return vec_cast<std::uint8_t, deducedT, vec_sz>(res_vec);
}
else {
auto const &res_vec = sycl::popcount(sycl::abs(x));
using deducedT = typename std::remove_cv_t<
std::remove_reference_t<decltype(res_vec)>>::element_type;
return vec_cast<std::uint8_t, deducedT, vec_sz>(res_vec);
}
}
The question only if any of that will bring performance benefits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be curious to know as well.
In the dpctl PR which started work on adding vector overloads for unary functions (IntelPython/dpctl#1223), little benefit was found, subgroup store/load seemed to make much more of a difference.
In this PR,
dpnp.bitwise_count
is implemented.