-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] [databricks] DeltaLake DeletionVector Scan #11942
Conversation
Signed-off-by: Raza Jafri <[email protected]>
… which will also be based off of 350
…ases end in db Revert the change in pom to remove 350db143 shim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had a few comments, but I have not finished looking at the code
|
||
import org.apache.spark.sql._ | ||
import org.apache.spark.sql.execution.command.LeafRunnableCommand | ||
|
||
/** GPU version of Delta Lake's WriteIntoDelta. */ | ||
case class GpuWriteIntoDelta( | ||
gpuDeltaLog: GpuDeltaLog, | ||
cpuWrite: WriteIntoDelta) | ||
cpuWrite: WriteIntoDeltaEdge) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work for other versions of Deltalake on databricks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Databricks build is definitely broken
override def getReadFileFormat(format: FileFormat): FileFormat = { | ||
val cpuFormat = format.asInstanceOf[DeltaParquetFileFormat] | ||
GpuDeltaParquetFileFormat.convertToGpu(cpuFormat) | ||
override def getReadFileFormat(relation: HadoopFsRelation): FileFormat = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here. Does this work for all versions of deltalake on databricks? The file appears to be in a common directory.
@@ -32,8 +32,8 @@ abstract class DeltaProviderImplBase extends DeltaProvider { | |||
), | |||
GpuOverrides.exec[RapidsDeltaWriteExec]( | |||
"GPU write into a Delta Lake table", | |||
ExecChecks.hiddenHack(), | |||
(wrapped, conf, p, r) => new RapidsDeltaWriteExecMeta(wrapped, conf, p, r)).invisible() | |||
ExecChecks(TypeSig.all, TypeSig.all), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we support all data types for writes to delta? Why is this a part of a read deltalake change? Or are we doing to do both here?
GpuColumnVector.from(table.getColumn(0).incRefCount(), | ||
METADATA_ROW_DEL_FIELD.dataType) | ||
} | ||
withResource(table) { _ => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Change is not needed
<artifactId>rapids-4-spark-delta-31x_2.12</artifactId> | ||
<name>RAPIDS Accelerator for Apache Spark Delta Lake 3.1.x Support</name> | ||
<description>Delta Lake 3.1.x support for the RAPIDS Accelerator for Apache Spark</description> | ||
<version>24.12.0-SNAPSHOT</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to update the version for 25.02
closed in lieu of #11964 |
Mainly @razajafri branch for #8654