Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate DataDirectory and Dataset Registration #95

Open
mdr223 opened this issue Jan 30, 2025 · 0 comments · May be fixed by #103
Open

Eliminate DataDirectory and Dataset Registration #95

mdr223 opened this issue Jan 30, 2025 · 0 comments · May be fixed by #103
Assignees
Labels
enhancement New feature or request

Comments

@mdr223
Copy link
Collaborator

mdr223 commented Jan 30, 2025

One of the biggest hurdles to users getting started quickly is the need for them to register datasets with PZ.

While dataset registration makes it easier for the system to track the lineage of computation:

  1. Caching is nowhere near being fully supported in PZ
  2. In an ideal world, PZ can still cache intermediate results effectively without always requiring users to register datasets and provide dataset ids

To emphasize this latter point: a user who wants to quickly test out PZ should not need to provide a dataset_id (or register their dataset) just because some (currently non-existent) small set of power users need this feature for their workloads.

There was a lot of discussion in Slack about the best approach to solving this issue, which I am linking to here: https://mitdsg.slack.com/archives/C076WBNJJAH/p1737222894617139

@mdr223 mdr223 added the enhancement New feature or request label Jan 30, 2025
@mdr223 mdr223 self-assigned this Feb 8, 2025
@mdr223 mdr223 linked a pull request Feb 8, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant