Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partial fetch is broken #10199

Closed
skshetry opened this issue Dec 25, 2023 · 2 comments · Fixed by iterative/dvc-data#490 or #10205
Closed

partial fetch is broken #10199

skshetry opened this issue Dec 25, 2023 · 2 comments · Fixed by iterative/dvc-data#490 or #10205
Labels
A: data-management Related to dvc add/checkout/commit/move/remove bug Did we break something? p1-important Important, aka current backlog of things to do regression Ohh, we broke something :-(

Comments

@skshetry
Copy link
Member

skshetry commented Dec 25, 2023

Partial fetch is broken in dvc.

Steps to Reproduce

#! /bin/bash

set -ex

gen() {
    mkdir -p "$1"
    for i in {00..99}; do echo "$1/$i" > "$1/${i}.txt"; done
}

setup() {
    pip install -q -e "."
    pushd "$1"
    dvc init -q --no-scm
    gen data/dir01
    gen data/dir02
    ls data
    find data/dir01 -type file | wc -l
    dvc remote add -q -d local "$(mktemp -d)"
    if ! dvc add data; then
        # fix umask imports
        pip install dvc-objects==1
        dvc add data
    fi
    dvc push
    command rm -rf .dvc/{cache,tmp} data
    popd
}

repo="$(mktemp -d)"
setup "$repo" || exit 125
pushd "$repo"
dvc pull data/dir01 || exit 125
# 100 files + .dir file
[ "$(find .dvc/cache -type file | wc -l)" -eq 101 ] || exit 1

This breaks an important scenario, see https://dvc.org/doc/user-guide/data-management/modifying-large-datasets#modifying-remote-datasets.

This regressed in 4c0bb8d (#9424) during the index migration.

git bisect start main 2342099bd876e4afe8da39d75578724de96f8346
git bisect run bash script.sh
@skshetry skshetry added p1-important Important, aka current backlog of things to do regression Ohh, we broke something :-( labels Dec 25, 2023
@skshetry
Copy link
Member Author

@skshetry skshetry added bug Did we break something? A: data-management Related to dvc add/checkout/commit/move/remove labels Dec 25, 2023
@skshetry skshetry added this to DVC Dec 26, 2023
@github-project-automation github-project-automation bot moved this to Backlog in DVC Dec 26, 2023
@skshetry
Copy link
Member Author

This is very interesting. If I do len(idx), it tells me the correct number of files and only downloads a subset. If I don't do anything, it downloads everything.

@github-project-automation github-project-automation bot moved this from Backlog to Done in DVC Dec 27, 2023
@skshetry skshetry reopened this Dec 27, 2023
@github-project-automation github-project-automation bot moved this from Done to Todo in DVC Dec 27, 2023
skshetry added a commit to skshetry/dvc that referenced this issue Dec 27, 2023
Closes iterative#10199. Bumps dvc-data to >=3.4.
skshetry added a commit to skshetry/dvc that referenced this issue Dec 27, 2023
Closes iterative#10199. Bumps dvc-data to >=3.4.
@github-project-automation github-project-automation bot moved this from Todo to Done in DVC Dec 27, 2023
skshetry added a commit that referenced this issue Dec 27, 2023
Closes #10199. Bumps dvc-data to >=3.4.
BradyJ27 pushed a commit to BradyJ27/dvc that referenced this issue Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: data-management Related to dvc add/checkout/commit/move/remove bug Did we break something? p1-important Important, aka current backlog of things to do regression Ohh, we broke something :-(
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant