Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming cloud-versioned datasets #10231

Closed
dberenbaum opened this issue Jan 11, 2024 · 2 comments · Fixed by #10287
Closed

Streaming cloud-versioned datasets #10231

dberenbaum opened this issue Jan 11, 2024 · 2 comments · Fixed by #10287
Assignees
Labels
A: api Related to the dvc.api p1-important Important, aka current backlog of things to do

Comments

@dberenbaum
Copy link
Collaborator

#10164 will introduce datasets as a new type of dependency that aren't based on the local filesystem. This same mechanism can be used to support cloud-versioned data. Users can specify a version ID, freeze it, make it a stage dependency, and stream it into their code using the DVC API.

One issue with cloud-versioning is that the directory contents get exploded, and this may cause problems reading dvc.lock.

@dberenbaum dberenbaum added p1-important Important, aka current backlog of things to do A: cloud-versioning A: api Related to the dvc.api labels Jan 11, 2024
@Siddharth1060
Copy link

Hi, I'm new to dvc and am using cloud versioning. I have my data on GCS which is tracked, which I want to load into a dataframe without actually downloading the data into my local(dvc pull). I tried using dvc.api.open() but it gives me a git SCM error!
Not sure if this is the right place to bring this up, but I'm really directionless at this point. Is my problem in the scope of issues to be fixed? Thanks!

@dberenbaum
Copy link
Collaborator Author

@Siddharth1060 Could you please open a separate issue and show the full output of the error you get?

@dberenbaum dberenbaum added this to DVC Jan 23, 2024
@github-project-automation github-project-automation bot moved this to Backlog in DVC Jan 23, 2024
@dberenbaum dberenbaum moved this from Backlog to Todo in DVC Jan 23, 2024
@skshetry skshetry linked a pull request Feb 23, 2024 that will close this issue
@github-project-automation github-project-automation bot moved this from Todo to Done in DVC Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: api Related to the dvc.api p1-important Important, aka current backlog of things to do
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants