[WIP] [databricks] DeltaLake DeletionVector Scan #11942

gerashegalov · 2025-01-10T03:56:52Z

Mainly @razajafri branch for #8654

Signed-off-by: Raza Jafri <[email protected]>

… which will also be based off of 350

…ases end in db Revert the change in pom to remove 350db143 shim

revans2

Just had a few comments, but I have not finished looking at the code

revans2 · 2025-01-10T15:19:38Z

...rc/main/databricks/scala/com/databricks/sql/transaction/tahoe/rapids/GpuWriteIntoDelta.scala


 import org.apache.spark.sql._
 import org.apache.spark.sql.execution.command.LeafRunnableCommand

 /** GPU version of Delta Lake's WriteIntoDelta. */
 case class GpuWriteIntoDelta(
    gpuDeltaLog: GpuDeltaLog,
-    cpuWrite: WriteIntoDelta)
+    cpuWrite: WriteIntoDeltaEdge)


Does this work for other versions of Deltalake on databricks?

Databricks build is definitely broken

revans2 · 2025-01-10T15:20:38Z

...on/src/main/databricks/scala/com/nvidia/spark/rapids/delta/DatabricksDeltaProviderBase.scala

-  override def getReadFileFormat(format: FileFormat): FileFormat = {
-    val cpuFormat = format.asInstanceOf[DeltaParquetFileFormat]
-    GpuDeltaParquetFileFormat.convertToGpu(cpuFormat)
+  override def getReadFileFormat(relation: HadoopFsRelation): FileFormat = {


Same question here. Does this work for all versions of deltalake on databricks? The file appears to be in a common directory.

revans2 · 2025-01-10T15:23:01Z

delta-lake/common/src/main/scala/com/nvidia/spark/rapids/delta/DeltaProviderImplBase.scala

@@ -32,8 +32,8 @@ abstract class DeltaProviderImplBase extends DeltaProvider {
      ),
      GpuOverrides.exec[RapidsDeltaWriteExec](
        "GPU write into a Delta Lake table",
-        ExecChecks.hiddenHack(),
-        (wrapped, conf, p, r) => new RapidsDeltaWriteExecMeta(wrapped, conf, p, r)).invisible()
+        ExecChecks(TypeSig.all, TypeSig.all),


So we support all data types for writes to delta? Why is this a part of a read deltalake change? Or are we doing to do both here?

revans2 · 2025-01-10T15:23:16Z

...ake/common/src/main/scala/com/nvidia/spark/rapids/delta/GpuDeltaParquetFileFormatUtils.scala

-                GpuColumnVector.from(table.getColumn(0).incRefCount(),
-                  METADATA_ROW_DEL_FIELD.dataType)
-              }
+            withResource(table) { _ =>


nit: Change is not needed

revans2 · 2025-01-10T15:24:24Z

delta-lake/delta-31x/pom.xml

+    <artifactId>rapids-4-spark-delta-31x_2.12</artifactId>
+    <name>RAPIDS Accelerator for Apache Spark Delta Lake 3.1.x Support</name>
+    <description>Delta Lake 3.1.x support for the RAPIDS Accelerator for Apache Spark</description>
+    <version>24.12.0-SNAPSHOT</version>


Need to update the version for 25.02

razajafri · 2025-01-14T05:01:36Z

closed in lieu of #11964

razajafri added 30 commits October 18, 2024 16:44

Added 350db shims

8f7be42

Signed-off-by: Raza Jafri <[email protected]>

Merge remote-tracking branch 'origin/branch-24.12' into HEAD

cceaa9b

changed 350db to 350db143 to plan for the upcoming Databricks release…

93c7ff5

… which will also be based off of 350

removed RapidsShuffleManager because it's shimmed

49e665a

Added pom changes

5bf423d

Merge remote-tracking branch 'upstream/branch-24.12' into HEAD

5aca66f

Fixed Errors due to upmerge

d4d879b

Append Databricks version to the spark version for 14.3 programmatically

8a874ca

Generated Scala 2.13 poms

848e9aa

addressed review comments

1938c2d

updated copyrights

7fe0830

Merge remote-tracking branch 'origin/branch-24.12' into HEAD

6c982f3

Removed 350db143 from build and included missing 353 shims

7f03c67

regenerated Scala 2.13 poms

bf9ef67

Handle the change that breaks the assumption that all Databricks rele…

883b1af

…ases end in db Revert the change in pom to remove 350db143 shim

regenerated 2.13 poms

e12e554

WIP deletion vector support

258eeb6

park

982b8c8

Start importing code

0ce2279

Compiling GpuDeltaParquetFileFormat

d258dad

Merge remote-tracking branch 'origin/branch-24.12' into HEAD

65a3ebc

delta-lake 31x changes

3e3ffd7

deletion vector read working

a772d73

Changes for merging the low shuffle merge and deletion vector

631c425

Fixed imports

fc8575d

Remove after deletion vector work

4d86683

Changes to read deletionvectors

66f43ce

park

eef4557

Building with the deletionvectors map

0f2f803

working but leaking a device vector

9b481c4

razajafri and others added 4 commits December 15, 2024 06:19

Fixed host memory leak but can't figure out device leak

4a2be9e

Fixed memory leaks

8ce925d

Use WriteIntoDeltaEdge instead of WriteIntoDelta

a8762e2

Merge remote-tracking branch 'upstream/branch-25.02' into raza-10661

6fc0c81

gerashegalov requested a review from a team as a code owner January 10, 2025 03:56

gerashegalov marked this pull request as draft January 10, 2025 03:57

revans2 reviewed Jan 10, 2025

View reviewed changes

gerashegalov self-assigned this Jan 10, 2025

gerashegalov changed the title ~~[WIP] DeltaLake DeletionVector Scan~~ [WIP] [databricks] DeltaLake DeletionVector Scan Jan 10, 2025

razajafri closed this Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] [databricks] DeltaLake DeletionVector Scan #11942

[WIP] [databricks] DeltaLake DeletionVector Scan #11942

gerashegalov commented Jan 10, 2025 •

edited

Loading

revans2 left a comment

revans2 Jan 10, 2025

gerashegalov Jan 10, 2025

revans2 Jan 10, 2025

revans2 Jan 10, 2025

revans2 Jan 10, 2025

revans2 Jan 10, 2025

razajafri commented Jan 14, 2025

[WIP] [databricks] DeltaLake DeletionVector Scan #11942

[WIP] [databricks] DeltaLake DeletionVector Scan #11942

Conversation

gerashegalov commented Jan 10, 2025 • edited Loading

revans2 left a comment

Choose a reason for hiding this comment

revans2 Jan 10, 2025

Choose a reason for hiding this comment

gerashegalov Jan 10, 2025

Choose a reason for hiding this comment

revans2 Jan 10, 2025

Choose a reason for hiding this comment

revans2 Jan 10, 2025

Choose a reason for hiding this comment

revans2 Jan 10, 2025

Choose a reason for hiding this comment

revans2 Jan 10, 2025

Choose a reason for hiding this comment

razajafri commented Jan 14, 2025

gerashegalov commented Jan 10, 2025 •

edited

Loading