Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-45159: [CI][Integration] Remove substrait consumer-testing integration job #45463

Merged
merged 1 commit into from
Feb 13, 2025

Conversation

raulcd
Copy link
Member

@raulcd raulcd commented Feb 7, 2025

Rationale for this change

The job has been failing for the last two months due to upstream refactoring. I opened an issue upstream but we didn't got any response:

Based on the following substrait consumer-testing PR there was a big refactor which removed the files that we were invoking on our tests:

I asked on the corresponding issue ~3 weeks ago for feedback on the removal and as there's no opposition I think we should remove this unmaintained job.

What changes are included in this PR?

Remove extremely coupled substrait-io/consumer-testing job.

Are these changes tested?

No

Are there any user-facing changes?

No

Copy link

github-actions bot commented Feb 7, 2025

⚠️ GitHub issue #45159 has been automatically assigned in GitHub to PR creator.

@ingomueller-net
Copy link
Contributor

LGTM!

Thanks for letting us know. I wasn't aware that there were any downstream users when I did the refactoring.

In the mid or long term, what would be requirements/wishes for a Substrait test suite for Arrow to consume?

@raulcd
Copy link
Member Author

raulcd commented Feb 7, 2025

Hi @ingomueller-net thanks for answering. From my understanding this was testing possible regressions for pyarrow-dev acero's consumer on the previous test_boolean_functions.py mainly on our nightly jobs. I am unsure whether more tests could be run or not. There is value on validating that new developments continue passing the integration validation. The problem arises on the lack of expertise on the consumer-testing on our repository.

@ingomueller-net
Copy link
Contributor

OK, thanks for the input. To paraphrase and add my own interpretation: I imagine you would like to have a suite of Substrait plans that you can run against the Arrow consumer as end-to-end tests that tell you whether the consumer respects the spec, right? Ideally, it would allow you to skip tests that you know the consumer currently (1) doesn't support or (2) produces wrong results (such that you can, at least, ensure that there are no regression on the others). You might even want the test suite to contain invalid Substrait plans that allow you to test whether your consumer correctly rejects them. Is that so?

@raulcd
Copy link
Member Author

raulcd commented Feb 7, 2025

That sounds reasonable to me but I am unsure on what is the current substrait status on Arrow. Probably @westonpace or @amol- know more about it.

Copy link
Member

@amoeba amoeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for doing this. If the consumer/producer integration testing could live entirely under the Substrait project I think that would be best for both projects.

@EpsilonPrime
Copy link
Contributor

We discussed this in the Substrait community meeting and agreed that this can be removed.

@amol-
Copy link
Member

amol- commented Feb 10, 2025

That sounds reasonable to me but I am unsure on what is the current substrait status on Arrow. Probably @westonpace or @amol- know more about it.

We do have some e2e tests here: https://github.com/apache/arrow/blob/main/python/pyarrow/tests/test_substrait.py
They test against substrait plans in binary and text format.
The testsuite is incomplete, but I think that it covers at least the capability to parse and run substrait itself.

@amol- amol- self-requested a review February 10, 2025 15:31
Copy link
Member

@amol- amol- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, I think that the other e2e tests that we have are a good trade-off

@raulcd
Copy link
Member Author

raulcd commented Feb 13, 2025

I am merging this.

@raulcd raulcd merged commit 251758d into apache:main Feb 13, 2025
36 checks passed
@raulcd raulcd removed the awaiting committer review Awaiting committer review label Feb 13, 2025
@raulcd raulcd deleted the GH-45159 branch February 13, 2025 10:19
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 251758d.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 5 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants