-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pull: "No file hash info found" and checkout fails when pulling frozen imported url that is a directory that has subsequently changed #10194
Comments
Thanks for the report @ahasha. I can confirm at least the part about push not doing anything, and I can reproduce even on a local remote with this script: set -eux
IMPORT=$(mktemp -d)
REPO=$(mktemp -d)
REMOTE=$(mktemp -d)
echo foo > $IMPORT/foo
tree $IMPORT
cd $REPO
git init -q
dvc init -q
dvc remote add -d default $REMOTE
dvc import-url $IMPORT dir
tree dir
dvc push
tree $REMOTE |
@efiop I'm marking as p0 for now because it looks severe and it's unclear if it impacts anything besides |
It was regressed from 3.27.1 after: Looks like a problem on how we collect storages/indexes. |
The pushed data not showing up in remote was a surprising new observation and not the bug that initially led me to open this issue, so I want to make sure this issue isn't considered resolved if that's corrected. The "No file has info found" error I'm reporting originally occurred in an environment running 3.27.0. In that instance, I saw that the Here's the dvc doctor output from that environment:
|
Ok, so just for the record: we set |
@ahasha Could you give an upstream dvc a try, please? |
Thanks! I'm using poetry, so I installed the upstream version using
|
It looks like I just need |
The upstream version is successfully pulling my tracked directory data from the remote! 🍾 🎆 Additionally, I ran the script to reproduce #10124 with this issue and I got the behavior I had requested. I don't know if this was intentional on your part. |
@ahasha-ml Thanks for the feedback! Yes, #10124 had the same underlying problem, so got fixed as well. Glad it works for you now, thank you! |
Bug Report
Description
I have used
dvc import-url
to track a GCS directory that is updated daily with incremental new data. The documentation suggests that once the data is imported, it is frozen and DVC should not attempt to download changes from the tracked URL unlessdvc update
is run.It also says that
Instead, when I
dvc push
, the log messages suggest the files were uploaded, but nothing shows up in the remote. y imported data to remote,dvc pull
from a fresh checkout downloads all the files from the remote url (including updated ones), and fails to check out the frozen version of the data as expected.Reproduce
I created a minimal example using a public GCS bucket in my account:
Output from
dvc push
isHowever I later noticed that no files had been pushed to the remote at all despite the misleading log message. I was able to correct this by running
dvc import-url gs://hasha-ds-public/test_dir --to-remote
, but this feels like another bug.Now set up a fresh clone of the project
Add another file to the tracked URL
Finally, try pulling the frozen data associated with the imported URL from remote into the fresh clone
13. cd dvc-import-issue2
14. dvc pull -r origin
Result
Note that it's trying to checkout
blah.txt
in the error message, though that file wasn't present when the directory was imported and test_dir.dvc suggests there should only be 3 files in the directory.*-I have pushed this repository to https://github.com/ahasha/dvc-import-url-pull-issue . You should be able to reproduce the error message by cloning it and running
dvc pull -r origin
.Expected
dvc import-url
,dvc push
should upload the version of the data tracked by the created .dvc file to the remotedvc pull
should retrieve the version of the data tracked by the .dvc file from the remote, and not download anything new from the tracked URL.Environment information
Output of
dvc doctor
:The text was updated successfully, but these errors were encountered: