Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental data processing with Maestro and Iceberg #46

Open
caldeirav opened this issue Aug 8, 2024 · 0 comments
Open

Incremental data processing with Maestro and Iceberg #46

caldeirav opened this issue Aug 8, 2024 · 0 comments

Comments

@caldeirav
Copy link
Collaborator

caldeirav commented Aug 8, 2024

Incremental processing is an approach to process new or changed data in workflows. The key advantage is that it only incrementally processes data that are newly added or updated to a dataset, instead of re-processing the complete dataset. This not only reduces the cost of compute resources but also reduces the execution time in a significant manner. When workflow execution has a shorter duration, chances of failure and manual intervention reduce. It also improves the engineering productivity by simplifying the existing pipelines and unlocking the new patterns.

We should explore incremental processing techniques whereby new data is incrementally integrated into our data product on Iceberg and a good starting point would be looking at what NetFlix is doing with Mastro:

https://netflixtechblog.com/incremental-processing-using-netflix-maestro-and-apache-iceberg-b8ba072ddeeb

@caldeirav caldeirav converted this from a draft issue Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

1 participant