You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I currently use DVC to track a large dataset in my git repository.
In my repo, parts of the dataset are available in the git repo itself in order for a new user or newly initialized repo to be able to play with a reduced dataset size of around 500MB rather than downloading the entire dataset of 30 GB for testing.
For instance, I'm currently tracking a folder called candles with the following directory structure:
Only the above 3 folders in candles/ are tracked, whereas the rest are git ignored.
Question:
Ideally, I want to add all directories in candles, which includes 3 tracked folder and 1500+ untracked folders by git. However, when I try to do dvc add candles/, it gives me the error:
$ dvc add candles
Adding...
ERROR: output 'candles' is already tracked by SCM (e.g. Git).
You can remove it from Git, then add to DVC.
To stop tracking from Git:
git rm -r --cached 'candles'
git commit -m "stop tracking candles"
I know, I can reorganize my folder structure / progamtaically add each folder. But given my requirements, ideally this command must simply backup an entire folder irrespective of whether parts of it are being already git tracked.
Is there any possibility to add such a feature/flag in cli or is this already implemented? My use case for dvc is purely for version controlled backups and hence I'm not really concerned if a file is being tracked by git as well.
Please do provide a response if time permits, means a lot! Thanks
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Context:
I currently use DVC to track a large dataset in my git repository.
In my repo, parts of the dataset are available in the git repo itself in order for a new user or newly initialized repo to be able to play with a reduced dataset size of around 500MB rather than downloading the entire dataset of 30 GB for testing.
For instance, I'm currently tracking a folder called
candles
with the following directory structure:Only the above 3 folders in
candles/
are tracked, whereas the rest are git ignored.Question:
Ideally, I want to add all directories in
candles
, which includes 3 tracked folder and 1500+ untracked folders by git. However, when I try to dodvc add candles/
, it gives me the error:I know, I can reorganize my folder structure / progamtaically add each folder. But given my requirements, ideally this command must simply backup an entire folder irrespective of whether parts of it are being already git tracked.
Is there any possibility to add such a feature/flag in cli or is this already implemented? My use case for
dvc
is purely for version controlled backups and hence I'm not really concerned if a file is being tracked by git as well.Please do provide a response if time permits, means a lot! Thanks
Beta Was this translation helpful? Give feedback.
All reactions