Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for tracking remote dataset #10287

Merged
merged 4 commits into from
Feb 26, 2024
Merged

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Feb 7, 2024

asciicast


This adds support for virtually tracking a dataset such as dvcx dataset, dataset from remote dvc/git registries, and cloud-versioned remotes.

This PR introduces two different commands under ds namespace: add and update.

dvc ds add command adds the dataset to dvc.yaml and tracks the sources in dvc.lock file. Similarly, dvc ds update updates the sources in dvc.lock file.

Example Usage:

dvc ds add --name dogs --url dvcx://dogs # tracking dvcx dataset
# tracking dvc/git registries
dvc ds add --name example --url dvc://[email protected]/iterative/example.git
# cloud-versioned-remotes
dvc ds add --name versioning --url s3://cloud-versioning-demo

Warning

This <schema>:// URI format was removed, and replaced with appropriate --dvc/--dvcx/--url flags in #10326.

To update, specify the name.

dvc ds update <name>

dvc ds add freezes (in the traditional sense of dvc import/import-url). It keeps a "specification" of the dataset in dvc.yaml and also snapshots information about the dataset in the dvc.lock file. They are kept inside datasets section in both dvc.yaml and dvc.lock files.

This metadata is used in the pipelines. You can add a dependency to your stage using ds:// scheme, followed by the name of the dataset in dvc.yaml file. As it is used in pipelines, the name of the dataset has to be unique in the repository. Different dataset of same names are not allowed. On dvc repro, dvc copies the snapshot information about the particular dataset into the deps field for the stage in dvc.lock. On successive invocation, dvc will compare the information from the deps field of the lock with the snapshot information in the datasets section and decides whether to rerun or not.

As the dataset is frozen, dvc repro won't rerun until the dataset is updated via dvc update.

Here are some examples for how dvc.yaml looks like:

datasets:
- name: dogs
  url: dvcx://dogs
  type: dvcx
- name: example-get-started
  url: [email protected]:iterative/example-get-started.git
  type: dvc
  path: path
- name: dogs2
  url: dvcx://dogs@v2
  type: dvcx
- name: cloud-versioning-demo
  url: s3://cloud-versioning-demo
  type: url
# dvc.lock
schema: '2.0'
datasets:
- name: dogs
  url: dvcx://dogs
  type: dvcx
  version: 3
  created_at: '2023-12-11T10:32:05.942708+00:00'
- name: example-get-started
  url: [email protected]:iterative/example-get-started.git
  type: dvc
  path: path
  rev_lock: df75c16ef61f0772d6e4bb27ba4617b06b4b5398
- name: cloud-versioning-demo
  url: s3://cloud-versioning-demo
  type: url
  meta:
    isdir: true
    size: 323919
    nfiles: 33
  files:
  - relpath: myproject/model.pt
    meta:
      size: 106433
      version_id: 5qrtnhnQ4fBzV73kqqK6pMGhTOzd_IPr
      etag: 3bc0028677ce6fb65bec8090c248b002
# truncated

Note

The files in datasets section differs slightly compared to cloud-versioned outputs, as this separates relpath and metadata in meta, whereas output has them together.

The pipeline stages keep them in dataset section of individual deps.

dvc stage add -n train -d ds://dogs python train.py
cat dvc.yaml
# truncated
stages:
  train:
    cmd: python train.py
    deps:
    - ds://dogs

When it is reproduced, the dvc.lock will look something like follows:

# truncated
stages:
  train:
    cmd: python train.py
    deps:
    - path: ds://dogs
      dataset:
        name: dogs
        url: dvcx://dogs
        type: dvcx
        version: 3
        created_at: '2023-12-11T10:32:05.942708+00:00'

See also:

@dberenbaum
Copy link
Collaborator

UX looks good!

There have been discussions before about consolidating the import syntax (cc @efiop) and wondering if we should take the opportunity to do it here. For example:

$ dvc ds add dvcx://dogs@v3
$ dvc ds add s3://cloud-versioning-demo
$ dvc ds add dvc://[email protected]:iterative/example-get-started.git@main#path=data/data.xml

@dberenbaum
Copy link
Collaborator

Notes from discussion with @skshetry:

@skshetry skshetry force-pushed the datasets branch 2 times, most recently from 4ffe4c7 to 84a5cf5 Compare February 20, 2024 13:17
Copy link

codecov bot commented Feb 20, 2024

Codecov Report

Attention: Patch coverage is 38.98734% with 241 lines in your changes are missing coverage. Please review.

Project coverage is 90.07%. Comparing base (89537a7) to head (f2db666).

Files Patch % Lines
dvc/repo/datasets.py 44.39% 133 Missing and 1 partial ⚠️
dvc/commands/dataset.py 28.57% 75 Missing ⚠️
dvc/dvcfile.py 24.00% 19 Missing ⚠️
dvc/dependency/dataset.py 7.14% 13 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10287      +/-   ##
==========================================
- Coverage   90.59%   90.07%   -0.52%     
==========================================
  Files         493      495       +2     
  Lines       37774    38161     +387     
  Branches     5460     5532      +72     
==========================================
+ Hits        34221    34375     +154     
- Misses       2919     3153     +234     
+ Partials      634      633       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@skshetry skshetry marked this pull request as ready for review February 21, 2024 08:52
@dberenbaum
Copy link
Collaborator

@skshetry What's the status of this?

@skshetry
Copy link
Member Author

skshetry commented Feb 21, 2024

This should be ready for review. The above asciinema is stale (although the only change is removal of --type with dvcx://, dvc:// and dvc+https:// schema). And support for pipelines with ds:// scheme.

eg:

dvc stage add -ntest -d ds://dataset_name "command"

To import data from dvc/git repositories, you'll need dvc:// schema to the repo url, e.g:

dvc://[email protected]/iterative/example-get-started.git
dvc+https://github.com/iterative/example-get-started.git

I'll try to update the asciinema recording tomorrow, and add more descriptions in the PR.

@skshetry
Copy link
Member Author

Updated the asciinema recording above.

This adds support for virtually tracking a
dataset such as dvcx dataset, dataset from remote dvc/git
registries, and cloud-versioned remotes.

This PR introduces two different commands under `ds` namespace:
`add` and `update`.

`dvc ds add` command adds the dataset to `dvc.yaml` and tracks
the sources in `dvc.lock` file. Similarly, `dvc ds update`
updates the sources in `dvc.lock` file.

Example Usage:
```console
dvc ds add --name dogs --url dvcx://dogs # tracking dvcx dataset
# tracking dvc/git registries
dvc ds add --name example --url dvc://[email protected]/iterative/example.git
# cloud-versioned-remotes
dvc ds add --name versioning --url s3://cloud-versioning-demo
```

To update, specify the name.
```console
dvc ds update <name>
```

`dvc ds add` freezes (in the traditional sense of `dvc import/import-url`).
It keeps a "specification" of the dataset in `dvc.yaml` and also freezes
information about the dataset in the `dvc.lock` file. They are kept
inside `datasets` section in both `dvc.yaml` and `dvc.lock` files.

This metadata is used in the pipelines. You can add a dependency to your
stage using `ds://` scheme, followed by the name of the dataset in
`dvc.yaml` file. As it is used in pipelines, the `name` of the dataset
has to be unique in the repository. Different dataset of same names
are not allowed. On `dvc repro`, `dvc` copies the frozen information
about the particular dataset into the `deps` field for the stage in
`dvc.lock`. On successive invocation, `dvc` will compare the information
from the deps field of the lock with the frozen information in the
`datasets` section and decides whether to rerun or not.

As the dataset is frozen, `dvc repro` won't rerun until the dataset is
updated via `dvc update`.

Here are some examples for how `dvc.yaml` looks like:
```yaml
datasets:
- name: dogs
  url: dvcx://dogs
  type: dvcx
- name: example-get-started
  url: [email protected]:iterative/example-get-started.git
  type: dvc
  path: path
- name: dogs2
  url: dvcx://dogs@v2
  type: dvcx
- name: cloud-versioning-demo
  url: s3://cloud-versioning-demo
  type: url
```

```yaml
schema: '2.0'
datasets:
- name: dogs
  url: dvcx://dogs
  type: dvcx
  version: 3
  created_at: '2023-12-11T10:32:05.942708+00:00'
- name: example-get-started
  url: [email protected]:iterative/example-get-started.git
  type: dvc
  path: path
  rev_lock: df75c16ef61f0772d6e4bb27ba4617b06b4b5398
- name: cloud-versioning-demo
  url: s3://cloud-versioning-demo
  type: url
  meta:
    isdir: true
    size: 323919
    nfiles: 33
  files:
  - relpath: myproject/model.pt
    meta:
      size: 106433
      version_id: 5qrtnhnQ4fBzV73kqqK6pMGhTOzd_IPr
      etag: 3bc0028677ce6fb65bec8090c248b002
# truncated
```

The pipeline stages keep them in `dataset` section of individual deps.

```console
dvc stage add -n train -d ds://dogs python train.py
cat dvc.yaml
```

```yaml
# truncated
stages:
  train:
    cmd: python train.py
    deps:
    - ds://dogs
```

When it is reproduced, the `dvc.lock` will look something like follows:
```yaml
# truncated
stages:
  train:
    cmd: python train.py
    deps:
    - path: ds://dogs
      dataset:
        name: dogs
        url: dvcx://dogs
        type: dvcx
        version: 3
        created_at: '2023-12-11T10:32:05.942708+00:00'
```
return self.msg


class Datasets(Mapping[str, Dataset]):
Copy link
Member Author

@skshetry skshetry Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Datasets is a registry API that will be useful for Studio to get information about datasets, and update the dataset if needed (there are some plans to offer "Run an experiment" with updated versions of the dataset).

It provides a dictionary-like API.

lockfile = Lockfile(self.repo, Path(manifest_path).with_suffix(".lock"))
lockfile.dump_dataset(lock_data)

def dump(self, dataset: Dataset, old: Optional[Dataset] = None) -> None:
Copy link
Member Author

@skshetry skshetry Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This provides a diff update to avoid accidentally modifying dvc.yaml or dvc.lock when not needed.

@@ -47,6 +47,7 @@
LOCKFILE_STAGES_SCHEMA = {str: LOCK_FILE_STAGE_SCHEMA}
LOCKFILE_SCHEMA = {
vol.Required("schema"): vol.Equal("2.0", "invalid schema version"),
"datasets": object,
Copy link
Member Author

@skshetry skshetry Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not making this schema stricter for now. We'll define this later.

I need to figure out how to define this without duplicating with Dataset/DatasetLock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with dvc.yaml schema below.



@frozen(kw_only=True, order=True)
class FileInfo(SerDe):
Copy link
Member Author

@skshetry skshetry Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have split up files metadata into relpath and meta. In dvc.lock, they are a single dictionary.

I did this to make it simpler to serialize/deserialize, and I find it conceptually better, as the meta is for a path.

We do have this inconsistency now with cloud-versioned outputs, though.



@frozen(kw_only=True)
class DVCDataset(SerDe):
Copy link
Member Author

@skshetry skshetry Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the *Dataset classes has a corresponding *DatasetLock class. *Dataset defines the specification for the dataset. *DatasetLock captures source information, but you'll see here that it also inherits from *Dataset classes. I am duplicating this information, because I want it to be possible to generate a dvc.yaml spec out of lock. This might be useful in the future. I definitely have wanted something similar with the stages (eg: run cache being useful to regenerate dvc.yaml etc.).

DatasetLock/Dataset are implemented from a serialization/deserialization perspective, and I haven't put much thought into anything else. We'll likely need to refactor them in the future. I do regret not using pydantic here, but attrs works for now.

Comment on lines +235 to +236
def __str__(self) -> str:
return self.msg
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to override __str__ as KeyError quotes the message.


name, version = self.name_version
catalog = get_catalog()
record = catalog.get_remote_dataset(name)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dvcx datasets only work with SaaS at the moment. But, we can support it in dvc easily.
But supporting it in Studio is a different matter. So I have left it out.

There is no unique identifier for a dataset. There is a created_at timestamp. But this is only reliable for a remote dataset. Local dataset will have created_at timestamp from when it was pull-ed to the machine.

There is no provenance information for the dataset, either. The dataset could have come from a local dvcx instances, or a self-hosted Studio instances.
(I haven't looked at shadow datasets or made any effort to understand what they are.)

I'd really prefer a UUID or anything equivalent that is backed by some randomness rather than a timestamp.

We have another problem here. The datasets are local to each team in dvcx, but "Project" in Studio aren't. There could be projects in your private namespace, and some projects might be in a team (or, multiple teams') namespace. And you can have different dataset of the same name among different teams. Considering these federated/distributed hosts, and multiple teams, I'd not trust the timestamp at all.


That said, I think it's best to ignore these things, unless dvcx starts taking these things seriously. :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it only makes sense for a remote/SaaS dataset. What's the point of versioning a local instance that could differ between users? If you want to pull a dataset locally, you can start with the remote dataset and track it with dvc ds add, then make that a dependency of a stage that uses the dvcx api to pull it locally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my point is that there is a mismatch between how dvc and dvcx works. DVC will look at the SaaS for the dataset, but dvcx works based on a local index.

So, a user might have a dataset dogs that was locally-created, but dvc will try to look into SaaS for it. The DatasetQuery runs locally based on that local dataset.
So we are not versioning what might actually get used.

Also, AFAIK the plan is to release CLI first and make some noise with it. So the user won't be able to use this integration unless we release SaaS at a later date (maybe end of this year?).

It'd be nice if we could always track the dataset that is available locally (and only fallback to SaaS if it does not exist), and have provenance information of where the dataset originated from so that we could link it back to the SaaS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how much you can do with a CLI version. It's fine if we want to only integrate with the SaaS. I just wanted to note that there is a difference of behavior between dvc and dvcx.

I guess that could be easily fixed on dvcx side via some API.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the example in #10316 doesn't make much sense as is since it will query the local instance, right?

There are a couple ways the workflow could make sense:

  1. API methods to read from SaaS.
  2. API to clone the dataset from SaaS to local.

@skshetry
Copy link
Member Author

The datasets section implemented in this PR is as a list in dvc.yaml. metrics/params/plots are also a list.
But the artifacts section is a dictionary, which has a similar schema/format as datasets with type/path etc.

I find lists simpler and easier to use, with fewer indentations. But I am still undecided.
So, I'll keep an option open for changing the datasets schema until we document it.

@skshetry skshetry mentioned this pull request Feb 23, 2024
@skshetry skshetry linked an issue Feb 23, 2024 that may be closed by this pull request
This was linked to issues Feb 23, 2024
@dberenbaum
Copy link
Collaborator

Nice @skshetry! It works well for me in doing QA over various different types. I mostly have UI questions.

$ dvc ds add dvc://[email protected]:iterative/example-get-started.git@main#path=data/data.xml

Do you have thoughts on trying to incorporate the path and rev into the url somehow like this? It feels like we are now stuck needing both complex CLI options (--rev and --path) and a complex URL format (dvc+https://github.com...). I think we might be better either going all the way in on the URL format or else reverting back to --type (but then we still need --rev/--path only for dvc datasets).

Other questions:

  • Should --url be a positional arg since it's required?
  • For https and hdfs, how do we version them and what's the intended use? I tried dvc ds add --url https://dvc.org/ --name dvc-website and got ERROR: unexpected error - 'HTTPSFileSystem' object has no attribute 'coalesce_version'
  • remote://remote_name/path/to/file/or/dir (see dvc remote) - is there an example of how to use this?

@skshetry
Copy link
Member Author

skshetry commented Feb 23, 2024

$ dvc ds add dvc://[email protected]:iterative/example-get-started.git@main#path=data/data.xml

I am not against supporting this. But I find this url to be very complex, and I doubt if I'll remember it when I need it.

Even if we do support both, I'd prefer keeping --rev/--path. I don't think it is complex or complicated to support.

Also, parsing these scp-style path is difficult IIUC as git@ is optional (cc @iterative/dvc, maybe you have some ideas). One idea that I have is to convert these scp-paths into ssh-paths and then parse, but I am not sure how robust this is, i.e

- [email protected]:iterative/example-get-started.git
+ ssh://[email protected]/iterative/example-get-started.git
  • Should --url be a positional arg since it's required?

yeah, both name and url should be a positional argument.

  • For https and hdfs, how do we version them and what's the intended use? I tried dvc ds add --url https://dvc.org/ --name dvc-website and got ERROR: unexpected error - 'HTTPSFileSystem' object has no attribute 'coalesce_version'

I am not sure about the intended use case. It might be useful in Studio to show if the source changed between commits from the metadata, but I am not sure how useful that information is. We could disable it for now.

  • remote://remote_name/path/to/file/or/dir (see dvc remote) - is there an example of how to use this?

This should just work without doing anything.

dvc remote add myremote s3://bucketname/prefix
dvc ds add --url remote://myremote/path/to/file/or/dir --name name

@dberenbaum
Copy link
Collaborator

I am not against supporting this. But I find this url to be very complex, and I doubt if I'll remember it when I need it.

Yup, it might be worse, especially if you think we should keep --path and --rev. Do you think we should revert to using --type then?

  • For https and hdfs, how do we version them and what's the intended use? I tried dvc ds add --url https://dvc.org/ --name dvc-website and got ERROR: unexpected error - 'HTTPSFileSystem' object has no attribute 'coalesce_version'

I am not sure about the intended use case. It might be useful in Studio to show if the source changed between commits from the metadata, but I am not sure how useful that information is. We could disable it for now.

Does it include a hash in dvc.lock? If the dataset changes, what happens if I do dvc repro with the old dataset?

@skshetry
Copy link
Member Author

Yup, it might be worse, especially if you think we should keep --path and --rev. Do you think we should revert to using --type then?

Is --url dvc://[email protected]:iterative/example-get-started.git worse than --url [email protected]:iterative/example-get-started.git --type dvc?

Does it include a hash in dvc.lock?

It has an etag, but we don't have a hash of the content (eg: md5). To hash the content, you'd have to stream them from the remote.

If the dataset changes, what happens if I do dvc repro with the old dataset?

Since the import is frozen on dvc ds add or dvc ds update, we don't run any kind of checks during dvc repro. So it does not work with non-cloud-versioned remotes at all.

And non-cloud versioned remotes can not be frozen at all, so this will require a different solution, in which we check the sources during dvc repro, and update dvc.lock accordingly. This does not fit how the "dataset" is implemented, which requires to be frozen.

@dberenbaum
Copy link
Collaborator

I don't see a good reason to support non-versioned types like https, hdfs, and maybe even remote (I guess it depends if it's version_aware). It seems like external dependencies already support this scenario in a simpler way unless I'm missing something.

@skshetry
Copy link
Member Author

skshetry commented Feb 23, 2024

I'll disable non-cloud-versioned remotes/filesystems in the successive PRs.

@dberenbaum
Copy link
Collaborator

dberenbaum commented Feb 23, 2024

Can you drop them from the parser help text in this PR? I was focused on that more than actually disabling them.

Edit: Or is there some case where https or hdfs is a valid type?

@skshetry
Copy link
Member Author

Can you drop them from the parser help text in this PR? I was focused on that more than actually disabling them.

I have dropped them from the help text.

@skshetry skshetry merged commit 34d80fb into iterative:main Feb 26, 2024
21 checks passed
@skshetry skshetry deleted the datasets branch February 26, 2024 15:43
BradyJ27 pushed a commit to BradyJ27/dvc that referenced this pull request Apr 22, 2024
* add support for tracking remote dataset

This adds support for virtually tracking a
dataset such as dvcx dataset, dataset from remote dvc/git
registries, and cloud-versioned remotes.

This PR introduces two different commands under `ds` namespace:
`add` and `update`.

`dvc ds add` command adds the dataset to `dvc.yaml` and tracks
the sources in `dvc.lock` file. Similarly, `dvc ds update`
updates the sources in `dvc.lock` file.

Example Usage:
```console
dvc ds add --name dogs --url dvcx://dogs # tracking dvcx dataset
# tracking dvc/git registries
dvc ds add --name example --url dvc://[email protected]/iterative/example.git
# cloud-versioned-remotes
dvc ds add --name versioning --url s3://cloud-versioning-demo
```

To update, specify the name.
```console
dvc ds update <name>
```

`dvc ds add` freezes (in the traditional sense of `dvc import/import-url`).
It keeps a "specification" of the dataset in `dvc.yaml` and also freezes
information about the dataset in the `dvc.lock` file. They are kept
inside `datasets` section in both `dvc.yaml` and `dvc.lock` files.

This metadata is used in the pipelines. You can add a dependency to your
stage using `ds://` scheme, followed by the name of the dataset in
`dvc.yaml` file. As it is used in pipelines, the `name` of the dataset
has to be unique in the repository. Different dataset of same names
are not allowed. On `dvc repro`, `dvc` copies the frozen information
about the particular dataset into the `deps` field for the stage in
`dvc.lock`. On successive invocation, `dvc` will compare the information
from the deps field of the lock with the frozen information in the
`datasets` section and decides whether to rerun or not.

As the dataset is frozen, `dvc repro` won't rerun until the dataset is
updated via `dvc update`.

Here are some examples for how `dvc.yaml` looks like:
```yaml
datasets:
- name: dogs
  url: dvcx://dogs
  type: dvcx
- name: example-get-started
  url: [email protected]:iterative/example-get-started.git
  type: dvc
  path: path
- name: dogs2
  url: dvcx://dogs@v2
  type: dvcx
- name: cloud-versioning-demo
  url: s3://cloud-versioning-demo
  type: url
```

```yaml
schema: '2.0'
datasets:
- name: dogs
  url: dvcx://dogs
  type: dvcx
  version: 3
  created_at: '2023-12-11T10:32:05.942708+00:00'
- name: example-get-started
  url: [email protected]:iterative/example-get-started.git
  type: dvc
  path: path
  rev_lock: df75c16ef61f0772d6e4bb27ba4617b06b4b5398
- name: cloud-versioning-demo
  url: s3://cloud-versioning-demo
  type: url
  meta:
    isdir: true
    size: 323919
    nfiles: 33
  files:
  - relpath: myproject/model.pt
    meta:
      size: 106433
      version_id: 5qrtnhnQ4fBzV73kqqK6pMGhTOzd_IPr
      etag: 3bc0028677ce6fb65bec8090c248b002
# truncated
```

The pipeline stages keep them in `dataset` section of individual deps.

```console
dvc stage add -n train -d ds://dogs python train.py
cat dvc.yaml
```

```yaml
# truncated
stages:
  train:
    cmd: python train.py
    deps:
    - ds://dogs
```

When it is reproduced, the `dvc.lock` will look something like follows:
```yaml
# truncated
stages:
  train:
    cmd: python train.py
    deps:
    - path: ds://dogs
      dataset:
        name: dogs
        url: dvcx://dogs
        type: dvcx
        version: 3
        created_at: '2023-12-11T10:32:05.942708+00:00'
```

* remodeling

* remove non-cloud-versioned remotes

* improve deserializing; handle invalidation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Streaming DVC imports Streaming cloud-versioned datasets DVCX datasets
2 participants