Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] [databricks] DeltaLake DeletionVector Scan #11942

Closed
wants to merge 34 commits into from

Conversation

gerashegalov
Copy link
Collaborator

@gerashegalov gerashegalov commented Jan 10, 2025

Mainly @razajafri branch for #8654

Signed-off-by: Raza Jafri <[email protected]>
…ases end in db

Revert the change in pom to remove 350db143 shim
@gerashegalov gerashegalov requested a review from a team as a code owner January 10, 2025 03:56
@gerashegalov gerashegalov marked this pull request as draft January 10, 2025 03:57
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a few comments, but I have not finished looking at the code


import org.apache.spark.sql._
import org.apache.spark.sql.execution.command.LeafRunnableCommand

/** GPU version of Delta Lake's WriteIntoDelta. */
case class GpuWriteIntoDelta(
gpuDeltaLog: GpuDeltaLog,
cpuWrite: WriteIntoDelta)
cpuWrite: WriteIntoDeltaEdge)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work for other versions of Deltalake on databricks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Databricks build is definitely broken

override def getReadFileFormat(format: FileFormat): FileFormat = {
val cpuFormat = format.asInstanceOf[DeltaParquetFileFormat]
GpuDeltaParquetFileFormat.convertToGpu(cpuFormat)
override def getReadFileFormat(relation: HadoopFsRelation): FileFormat = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here. Does this work for all versions of deltalake on databricks? The file appears to be in a common directory.

@@ -32,8 +32,8 @@ abstract class DeltaProviderImplBase extends DeltaProvider {
),
GpuOverrides.exec[RapidsDeltaWriteExec](
"GPU write into a Delta Lake table",
ExecChecks.hiddenHack(),
(wrapped, conf, p, r) => new RapidsDeltaWriteExecMeta(wrapped, conf, p, r)).invisible()
ExecChecks(TypeSig.all, TypeSig.all),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we support all data types for writes to delta? Why is this a part of a read deltalake change? Or are we doing to do both here?

GpuColumnVector.from(table.getColumn(0).incRefCount(),
METADATA_ROW_DEL_FIELD.dataType)
}
withResource(table) { _ =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Change is not needed

<artifactId>rapids-4-spark-delta-31x_2.12</artifactId>
<name>RAPIDS Accelerator for Apache Spark Delta Lake 3.1.x Support</name>
<description>Delta Lake 3.1.x support for the RAPIDS Accelerator for Apache Spark</description>
<version>24.12.0-SNAPSHOT</version>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update the version for 25.02

@gerashegalov gerashegalov self-assigned this Jan 10, 2025
@gerashegalov gerashegalov changed the title [WIP] DeltaLake DeletionVector Scan [WIP] [databricks] DeltaLake DeletionVector Scan Jan 10, 2025
@razajafri
Copy link
Collaborator

closed in lieu of #11964

@razajafri razajafri closed this Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants