-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MetricsRepository using Spark tables as the data source #518
MetricsRepository using Spark tables as the data source #518
Conversation
8f8d004
to
209eba3
Compare
Hi @VenkataKarthikP, thank you for the PR! While we review the PR, could you please provide more details on the use case that this PR helps enable? An example snippet of code of how this feature can be used will be great, to both serve as motivation for the PR and for documentation going forward. |
@rdsharma26 updated PR description with more details, let me know if it looks good. |
src/main/scala/com/amazon/deequ/repository/sparktable/SparkMetricsRepository.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/amazon/deequ/repository/sparktable/SparkMetricsRepository.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/amazon/deequ/repository/sparktable/SparkMetricsRepository.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/amazon/deequ/repository/sparktable/SparkMetricsRepository.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/amazon/deequ/repository/sparktable/SparkMetricsRepository.scala
Show resolved
Hide resolved
src/main/scala/com/amazon/deequ/repository/sparktable/SparkMetricsRepository.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/amazon/deequ/repository/sparktable/SparkMetricsRepository.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/amazon/deequ/repository/sparktable/SparkTableMetricsRepositoryTest.scala
Outdated
Show resolved
Hide resolved
@rdsharma26 thanks for review, have addressed review comments, with some follow up questions. |
metricDF.write | ||
.mode(SaveMode.Append) | ||
.saveAsTable(tableName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @VenkataKarthikP, thanks a lot for the added feature!!!
Could we consider to let the user define the write options in some way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I think it's great idea. let me check!
@rdsharma26 can we merge this PR. thanks |
* spark table repository * review comments --------- Co-authored-by: vpenikalapati <[email protected]>
* spark table repository * review comments --------- Co-authored-by: vpenikalapati <[email protected]>
* spark table repository * review comments --------- Co-authored-by: vpenikalapati <[email protected]>
* spark table repository * review comments --------- Co-authored-by: vpenikalapati <[email protected]>
* spark table repository * review comments --------- Co-authored-by: vpenikalapati <[email protected]>
* spark table repository * review comments --------- Co-authored-by: vpenikalapati <[email protected]>
* spark table repository * review comments --------- Co-authored-by: vpenikalapati <[email protected]>
Issue #, if available:
#502
Description of changes:
An implementation of MetricsRepository using Spark tables as the data source.
Currently, DQ heavily relies on spark for processing. But to save metrics to a table like Iceberg or Hive or other spark catalog based, we don't have an implementation. With this PR, users can save metrics to a table and read them offline as needed.
Example usage -
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.