Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DeltaLake test failures #11541

Open
31 of 64 tasks
razajafri opened this issue Sep 28, 2024 · 6 comments
Open
31 of 64 tasks

Fix DeltaLake test failures #11541

razajafri opened this issue Sep 28, 2024 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@razajafri
Copy link
Collaborator

razajafri commented Sep 28, 2024

The following tests are failing in DeltaLake

The following tests are failing in DeltaLake

  • delta_lake_auto_compact_test.py::test_auto_compact_basic
  • delta_lake_auto_compact_test.py::test_auto_compact_disabled
  • delta_lake_auto_compact_test.py::test_auto_compact_min_num_files
  • delta_lake_auto_compact_test.py::test_auto_compact_partitioned
  • delta_lake_delete_test.py::test_delta_delete_dataframe_api
  • delta_lake_delete_test.py::test_delta_delete_rows
  • delta_lake_low_shuffle_merge_test.py::test_delta_low_shuffle_merge_when_gpu_file_scan_override_
  • delta_lake_low_shuffle_merge_test.py::test_delta_merge_match_delete_only
  • delta_lake_low_shuffle_merge_test.py::test_delta_merge_not_match_insert_only
  • delta_lake_low_shuffle_merge_test.py::test_delta_merge_standard_upsert
  • delta_lake_low_shuffle_merge_test.py::test_delta_merge_update_with_aggregation
  • delta_lake_low_shuffle_merge_test.py::test_delta_merge_upsert_with_condition
  • delta_lake_low_shuffle_merge_test.py::test_delta_merge_upsert_with_unmatchable_match_condition
  • delta_lake_merge_test.py::test_delta_merge_dataframe_api
  • delta_lake_merge_test.py::test_delta_merge_disabled_fallback
  • delta_lake_merge_test.py::test_delta_merge_match_delete_only
  • delta_lake_merge_test.py::test_delta_merge_not_match_insert_only
  • delta_lake_merge_test.py::test_delta_merge_partial_fallback_via_conf
  • delta_lake_merge_test.py::test_delta_merge_standard_upsert
  • delta_lake_merge_test.py::test_delta_merge_update_with_aggregation
  • delta_lake_merge_test.py::test_delta_merge_upsert_with_condition
  • delta_lake_merge_test.py::test_delta_merge_upsert_with_unmatchable_match_condition
  • delta_lake_test.py::test_delta_name_column_mapping_no_field_ids
  • delta_lake_test.py::test_delta_read_column_mapping
  • delta_lake_update_test.py::test_delta_update_dataframe_api
  • delta_lake_update_test.py::test_delta_update_disabled_fallback
  • delta_lake_update_test.py::test_delta_update_entire_table
  • delta_lake_update_test.py::test_delta_update_partitions
  • delta_lake_update_test.py::test_delta_update_rows
  • delta_lake_write_test.py::test_delta_append_data_exec_v1
  • delta_lake_write_test.py::test_delta_multi_part_write_round_trip_unmanaged
  • delta_lake_write_test.py::test_delta_overwrite_by_expression_exec_v1
  • delta_lake_write_test.py::test_delta_overwrite_dynamic_by_name
  • delta_lake_write_test.py::test_delta_overwrite_dynamic_missing_clauses
  • delta_lake_write_test.py::test_delta_overwrite_mixed_clause
  • delta_lake_write_test.py::test_delta_part_write_round_trip_unmanaged
  • delta_lake_write_test.py::test_delta_write_append_only
  • delta_lake_write_test.py::test_delta_write_column_name_mapping
  • delta_lake_write_test.py::test_delta_write_constraint_check
  • delta_lake_write_test.py::test_delta_write_constraint_check_fallback
  • delta_lake_write_test.py::test_delta_write_constraint_not_null
  • delta_lake_write_test.py::test_delta_write_generated_columns
  • delta_lake_write_test.py::test_delta_write_identity_columns
  • delta_lake_write_test.py::test_delta_write_legacy_timestamp
  • delta_lake_write_test.py::test_delta_write_multiple_identity_columns
  • delta_lake_write_test.py::test_delta_write_optimized_aqe
  • delta_lake_write_test.py::test_delta_write_optimized_partitioned
  • delta_lake_write_test.py::test_delta_write_optimized_supported_types
  • delta_lake_write_test.py::test_delta_write_optimized_supported_types_partitioned
  • delta_lake_write_test.py::test_delta_write_optimized_table_confs
  • delta_lake_write_test.py::test_delta_write_partial_overwrite_replace_where
  • delta_lake_write_test.py::test_delta_write_round_trip_cdf_table_prop
  • delta_lake_write_test.py::test_delta_write_stat_column_limits
  • delta_zorder_test.py::test_delta_zorder
  • delta_lake_write_test.py::test_delta_append_round_trip_unmanaged
  • delta_lake_write_test.py::test_delta_atomic_create_table_as_select
  • delta_lake_write_test.py::test_delta_atomic_replace_table_as_select
  • delta_lake_write_test.py::test_delta_compaction
  • delta_lake_write_test.py::test_delta_overwrite_round_trip_unmanaged
  • delta_lake_write_test.py::test_delta_overwrite_schema_evolution_arrays
  • delta_lake_write_test.py::test_delta_write_aqe_join
  • delta_lake_write_test.py::test_delta_write_round_trip_cdf_write_opt
  • delta_lake_write_test.py::test_delta_write_round_trip_unmanaged
  • delta_zorder_test.py::test_delta_dfp_reuse_broadcast_exchange
@razajafri razajafri self-assigned this Oct 7, 2024
@sameerz sameerz added the bug Something isn't working label Nov 16, 2024
@razajafri
Copy link
Collaborator Author

@razajafri is triaging:
delta_lake_auto_compact_test.py
delta_lake_delete_test.py
delta_lake_low_shuffle_merge_test.py

@gerashegalov is triaging:
delta_lake_merge_test.py
delta_lake_test.py
delta_lake_update_test.py

@mythrocks is triaging:
delta_lake_write_test.py
delta_zorder_test.py

@razajafri
Copy link
Collaborator Author

By running the tests with PERFILE reader here is my analysis

delta_lake_auto_compact.py - passes without any failures
delta_lake_delete_test.py - failing due to the deletion vectors writes not supported on GPU
delta_lake_low_shuffle_merge_test.py - fails. I suspect this is due to #11711

@gerashegalov
Copy link
Collaborator

@razajafri Please provide the commit hash and branch on which you are running your test. Since the code is not merged yet, it is hard to track where progress is being made.

@razajafri
Copy link
Collaborator Author

Here is the branch that I ran my analysis on. https://github.com/razajafri/spark-rapids/tree/SP-10661-db-14.3-deletion-vectors

@gerashegalov
Copy link
Collaborator

The branch is going to be updated, please specify commit hash too @razajafri

@mythrocks
Copy link
Collaborator

mythrocks commented Jan 22, 2025

From the set of failures that I've been examining, i.e.

  1. delta_lake_write_test.py
  2. delta_zorder_test.py

Here are the tests that are still failing. (Mind, these are currently xfailed, to allow CI to pass.):

1. delta_lake_write_test.py::test_delta_write_round_trip_unmanaged         ("tightBounds":true)
2. delta_lake_write_test.py::test_delta_overwrite_round_trip_unmanaged.    ("tightBounds":true)
3. delta_lake_write_test.py::test_delta_append_round_trip_unmanaged.       ("tightBounds":true)
4. delta_lake_write_test.py::test_delta_atomic_create_table_as_select.     ("Use WriteIntoDeltaEdge instead of WriteIntoDelta")
5. delta_lake_write_test.py::test_delta_atomic_replace_table_as_select.    ("Use WriteIntoDeltaEdge instead of WriteIntoDelta")
6. delta_lake_write_test.py::test_delta_overwrite_schema_evolution_arrays. (RapidsDeltaWrite falls off the GPU.)
7. delta_lake_write_test.py::test_delta_write_round_trip_cdf_write_opt.    ("tightBounds":true)
8. delta_lake_write_test.py::test_delta_write_aqe_join                     ("tightBounds":true)
9. delta_lake_write_test.py::test_delta_compaction                         (Falls off GPU. ColumnarToRow invoked.)
10. delta_zorder_test.py::test_delta_dfp_reuse_broadcast_exchange          (ShuffleExchangeExec falls off GPU.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants