Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: expose resetting run boundaries #112

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

drake-nominal
Copy link
Contributor

When creating a run, a user is required to enter in the start / end timestamps before:

  • All datasets are added to the run
  • All data is present in the datasets being added to the run (e.g. a dataset is mid ingest via multi-file or may otherwise be updated in the future)

As a result, customers can end up in a situation where the start / end bounds of a run aren't particularly accurate as data continues to be ingested into the platform, and having a simple way to just "reset" the bounds turns out to be powerful.

@drake-nominal drake-nominal requested a review from alkasm October 29, 2024 22:02
@drake-nominal drake-nominal self-assigned this Oct 29, 2024
@alkasm
Copy link
Contributor

alkasm commented Oct 29, 2024

All data is present in the datasets being added to the run (e.g. a dataset is mid ingest via multi-file or may otherwise be updated in the future)

IIUC this PR doesn't fix this issue? The datasets that are still mid-ingest are filtered out.

I'd like to minimize non-idempotent mutations as first-class functionality in the lib.

@drake-nominal drake-nominal force-pushed the deidukas/reset-run-bounds branch from b875393 to 9bde0f6 Compare October 29, 2024 22:28
@drake-nominal
Copy link
Contributor Author

drake-nominal commented Oct 29, 2024

IIUC this PR doesn't fix this issue? The datasets that are still mid-ingest are filtered out.

@alkasm in a single file world, sure, but this is a pretty rare edge case in the long run. Consider the case where the customer has 50000 files that compose one of the datasets instead-- now "mid ingest" can still mean that the dataset shows up as "ingested" in product. Or perhaps new files get added later after the run is created.

@drake-nominal drake-nominal force-pushed the deidukas/add-dataset-bounds branch from adcab95 to 2aac0a0 Compare October 29, 2024 22:45
Base automatically changed from deidukas/add-dataset-bounds to main October 29, 2024 23:18
@drake-nominal drake-nominal force-pushed the deidukas/reset-run-bounds branch from 9bde0f6 to 35830ab Compare November 15, 2024 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants