Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MetricsRepository using Spark tables as the data source #518

Merged
merged 2 commits into from
Nov 28, 2023

Conversation

VenkataKarthikP
Copy link
Contributor

@VenkataKarthikP VenkataKarthikP commented Oct 30, 2023

Issue #, if available:
#502
Description of changes:
An implementation of MetricsRepository using Spark tables as the data source.

Currently, DQ heavily relies on spark for processing. But to save metrics to a table like Iceberg or Hive or other spark catalog based, we don't have an implementation. With this PR, users can save metrics to a table and read them offline as needed.

Example usage -

val spark = <your spark session>
val repository = new SparkTableMetricsRepository(spark, "metrics_table")

repository.save(resultKey, context)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@VenkataKarthikP VenkataKarthikP force-pushed the table-metric-repository branch from 8f8d004 to 209eba3 Compare October 30, 2023 06:56
@rdsharma26
Copy link
Contributor

Hi @VenkataKarthikP, thank you for the PR! While we review the PR, could you please provide more details on the use case that this PR helps enable? An example snippet of code of how this feature can be used will be great, to both serve as motivation for the PR and for documentation going forward.

@VenkataKarthikP
Copy link
Contributor Author

Hi @VenkataKarthikP, thank you for the PR! While we review the PR, could you please provide more details on the use case that this PR helps enable? An example snippet of code of how this feature can be used will be great, to both serve as motivation for the PR and for documentation going forward.

@rdsharma26 updated PR description with more details, let me know if it looks good.

@VenkataKarthikP
Copy link
Contributor Author

@rdsharma26 thanks for review, have addressed review comments, with some follow up questions.

Comment on lines +40 to +42
metricDF.write
.mode(SaveMode.Append)
.saveAsTable(tableName)
Copy link

@SirWerto SirWerto Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @VenkataKarthikP, thanks a lot for the added feature!!!

Could we consider to let the user define the write options in some way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I think it's great idea. let me check!

@VenkataKarthikP
Copy link
Contributor Author

@rdsharma26 can we merge this PR. thanks

@rdsharma26 rdsharma26 merged commit 1fc09e1 into awslabs:master Nov 28, 2023
1 check passed
eycho-am pushed a commit that referenced this pull request Feb 21, 2024
* spark table repository

* review comments

---------

Co-authored-by: vpenikalapati <[email protected]>
eycho-am pushed a commit that referenced this pull request Feb 21, 2024
* spark table repository

* review comments

---------

Co-authored-by: vpenikalapati <[email protected]>
rdsharma26 pushed a commit that referenced this pull request Apr 16, 2024
* spark table repository

* review comments

---------

Co-authored-by: vpenikalapati <[email protected]>
rdsharma26 pushed a commit that referenced this pull request Apr 16, 2024
* spark table repository

* review comments

---------

Co-authored-by: vpenikalapati <[email protected]>
rdsharma26 pushed a commit that referenced this pull request Apr 16, 2024
* spark table repository

* review comments

---------

Co-authored-by: vpenikalapati <[email protected]>
rdsharma26 pushed a commit that referenced this pull request Apr 17, 2024
* spark table repository

* review comments

---------

Co-authored-by: vpenikalapati <[email protected]>
rdsharma26 pushed a commit that referenced this pull request Apr 17, 2024
* spark table repository

* review comments

---------

Co-authored-by: vpenikalapati <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants