Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement dpnp.bitwise_count #2308

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

implement dpnp.bitwise_count #2308

wants to merge 11 commits into from

Conversation

vtavana
Copy link
Collaborator

@vtavana vtavana commented Feb 11, 2025

In this PR, dpnp.bitwise_count is implemented.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you filing the PR as a draft?

@vtavana vtavana self-assigned this Feb 11, 2025
Copy link
Contributor

View rendered docs @ https://intelpython.github.io/dpnp/pull/2308/index.html

Copy link
Contributor

github-actions bot commented Feb 11, 2025

Array API standard conformance tests for dpnp=0.17.0dev6=py312he4f9c94_22 ran successfully.
Passed: 991
Failed: 1
Skipped: 22

@vtavana vtavana marked this pull request as ready for review February 11, 2025 22:52
// constant value, if constant
// constexpr resT constant_value = resT{};
// is function defined for sycl::vec
using supports_vec = typename std::false_type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have a vector implementation, since sycl::popcount support that?
And we have options here: it can be supported either only for int8_t where no casting is needed or for all integer types through explicit vector casting, like dpctl does:

    template <int vec_sz>
    sycl::vec<resT, vec_sz> operator()(const sycl::vec<argT, vec_sz> &x) const
    {
        if constexpr (std::is_unsigned_v<argT>) {
            auto const &res_vec = sycl::popcount(x);

            using deducedT = typename std::remove_cv_t<
                    std::remove_reference_t<decltype(res_vec)>>::element_type;

            return vec_cast<std::uint8_t, deducedT, vec_sz>(res_vec);
        }
        else {
            auto const &res_vec = sycl::popcount(sycl::abs(x));

            using deducedT = typename std::remove_cv_t<
                    std::remove_reference_t<decltype(res_vec)>>::element_type;

            return vec_cast<std::uint8_t, deducedT, vec_sz>(res_vec);
        }
    }

The question only if any of that will bring performance benefits.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be curious to know as well.

In the dpctl PR which started work on adding vector overloads for unary functions (IntelPython/dpctl#1223), little benefit was found, subgroup store/load seemed to make much more of a difference.

@coveralls
Copy link
Collaborator

Coverage Status

coverage: 71.704% (-0.03%) from 71.737%
when pulling 371319a on impl-bitwise_count
into 0d5ffad on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants