Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support the null value in bloom_filter_agg Spark aggregate function #458

Open
wants to merge 28 commits into
base: update
Choose a base branch
from

Conversation

weixiuli
Copy link

Currently, the velox BloomFilterAggregate checks the input row and throws an exception if there are some null values in the row. So we need to be consistent with spark's behavior and ignore null values.

The spark BloomFilterAggregate will Ignore null values. https://github.com/apache/spark/blob/6cdca10f148433664b3e2be6f655b0ddba817537/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/BloomFilterAggregate.scala#L180-L188

 override def update(buffer: BloomFilter, inputRow: InternalRow): BloomFilter = {
    val value = child.eval(inputRow)
    // Ignore null values.
    if (value == null) {
      return buffer
    }
    updater.update(buffer, value)
    buffer
  }

weixiuli pushed a commit to weixiuli/gluten that referenced this pull request Dec 12, 2023
@zhztheplayer zhztheplayer force-pushed the update branch 2 times, most recently from 13e79b6 to 8a6ef2b Compare December 13, 2023 07:11
@weixiuli weixiuli changed the title Support the null values in bloom_filter Spark aggregate Support the null value in bloom_filter_agg Spark aggregate function Dec 14, 2023
@GlutenPerfBot GlutenPerfBot force-pushed the update branch 7 times, most recently from 52b2ed1 to 683b42f Compare January 11, 2025 22:07
@GlutenPerfBot GlutenPerfBot force-pushed the update branch 6 times, most recently from 7d0b16c to 60c6880 Compare January 18, 2025 22:07
@GlutenPerfBot GlutenPerfBot force-pushed the update branch 5 times, most recently from 1a139e6 to cf25ecf Compare January 25, 2025 23:07
@GlutenPerfBot GlutenPerfBot force-pushed the update branch 3 times, most recently from d3ba33c to 4f9caa9 Compare January 28, 2025 23:07
@GlutenPerfBot GlutenPerfBot force-pushed the update branch 6 times, most recently from c2655fd to 31ae361 Compare February 5, 2025 23:07
@GlutenPerfBot GlutenPerfBot force-pushed the update branch 2 times, most recently from 72a7f01 to 1e39a3a Compare February 7, 2025 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants