Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetch: use index fetch #9424

Merged
merged 1 commit into from
Jun 10, 2023
Merged

fetch: use index fetch #9424

merged 1 commit into from
Jun 10, 2023

Conversation

efiop
Copy link
Contributor

@efiop efiop commented May 8, 2023

Part of #9333

Notable improvements:

  • All files are parallelized now, no matter if they are regular files, cloud versioning or imports.

@efiop efiop force-pushed the fix-dvc-data-341 branch 11 times, most recently from e26919f to b6fbe3e Compare May 10, 2023 20:45
efiop added a commit to efiop/dvc that referenced this pull request May 11, 2023
efiop added a commit to efiop/dvc that referenced this pull request May 11, 2023
efiop added a commit to efiop/dvc that referenced this pull request May 11, 2023
efiop added a commit to efiop/dvc that referenced this pull request May 11, 2023
efiop added a commit to efiop/dvc that referenced this pull request May 12, 2023
efiop added a commit to efiop/dvc that referenced this pull request May 12, 2023
efiop added a commit to efiop/dvc that referenced this pull request May 12, 2023
efiop added a commit to efiop/dvc that referenced this pull request May 12, 2023
efiop added a commit to efiop/dvc that referenced this pull request May 12, 2023
efiop added a commit to efiop/dvc that referenced this pull request May 14, 2023
efiop added a commit to efiop/dvc that referenced this pull request May 14, 2023
efiop added a commit to efiop/dvc that referenced this pull request May 15, 2023
efiop added a commit to efiop/dvc that referenced this pull request May 15, 2023
efiop added a commit to efiop/dvc that referenced this pull request May 15, 2023
efiop added a commit to efiop/dvc that referenced this pull request May 15, 2023
@efiop efiop force-pushed the fix-dvc-data-341 branch 4 times, most recently from 41c148c to cd096fb Compare June 9, 2023 19:57
efiop added a commit to efiop/dvc-data that referenced this pull request Jun 9, 2023
efiop added a commit to efiop/dvc-data that referenced this pull request Jun 9, 2023
efiop added a commit to iterative/dvc-data that referenced this pull request Jun 9, 2023
@efiop efiop force-pushed the fix-dvc-data-341 branch 5 times, most recently from 1e8e4ae to 63c0a3c Compare June 10, 2023 01:03
@codecov
Copy link

codecov bot commented Jun 10, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.14 ⚠️

Comparison is base (1cb371d) 90.67% compared to head (63c0a3c) 90.54%.

❗ Current head 63c0a3c differs from pull request most recent head a6701d7. Consider uploading reports for the commit a6701d7 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9424      +/-   ##
==========================================
- Coverage   90.67%   90.54%   -0.14%     
==========================================
  Files         470      470              
  Lines       35944    35886      -58     
  Branches     5180     5171       -9     
==========================================
- Hits        32593    32493     -100     
- Misses       2755     2800      +45     
+ Partials      596      593       -3     
Impacted Files Coverage Δ
dvc/repo/pull.py 100.00% <ø> (ø)
dvc/repo/worktree.py 9.75% <ø> (+0.03%) ⬆️
tests/func/test_import_url.py 100.00% <ø> (ø)
tests/func/test_virtual_directory.py 100.00% <ø> (ø)
dvc/repo/fetch.py 93.93% <100.00%> (+22.92%) ⬆️
dvc/repo/index.py 92.80% <100.00%> (+0.05%) ⬆️
tests/func/test_data_cloud.py 99.42% <100.00%> (ø)
tests/func/test_import.py 99.43% <100.00%> (-0.02%) ⬇️

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@efiop efiop force-pushed the fix-dvc-data-341 branch 6 times, most recently from 0b283e2 to 01095e1 Compare June 10, 2023 17:02
@efiop efiop changed the title [WIP] fetch: use index fetch fetch: use index fetch Jun 10, 2023
@efiop efiop marked this pull request as ready for review June 10, 2023 18:18
@efiop efiop merged commit 4c0bb8d into iterative:main Jun 10, 2023
@efiop efiop self-assigned this Jun 10, 2023
@efiop efiop mentioned this pull request Jul 2, 2023
12 tasks
@skshetry skshetry mentioned this pull request Nov 28, 2023
5 tasks
@@ -182,7 +182,7 @@ def test_partial_checkout_and_update(M, tmp_dir, dvc, remote):

assert dvc.pull("dir/subdir") == M.dict(
added=[join("dir", "")],
fetched=1,
fetched=3,
Copy link
Member

@skshetry skshetry Dec 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an intentional change or an oversight? Previously, we could fetch subset of files, but after this PR, we fetch everything.

I was not aware about this change, which I just noticed when someone asked in https://discuss.dvc.org/t/why-dvc-pulls-full-dataset-instead-of-a-single-file/1897.

This breaks an important scenario for virtual directory, see https://dvc.org/doc/user-guide/data-management/modifying-large-datasets#modifying-remote-datasets.

Do we have no means to fetch a subset, @efiop?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants