Extract checksums to a common state file #5931
Replies: 5 comments
-
An idea: it might happen that However, there is a small disadvantage - it might lead to empty directories which won't be reflected by Git. |
Beta Was this translation helpful? Give feedback.
-
We have several issues that discuss the way we organize the state on files. With this approach, it doesn't matter how do we identify each block/object of the store (gathering a collection of stage files, querying a single file, or split it across several files -- checksums, pipelines, artifcats.) |
Beta Was this translation helpful? Give feedback.
-
Could you elaborate what would be the best practice? |
Beta Was this translation helpful? Give feedback.
-
@dmpetrov, by the way, terraform has an option to submit the state to a remote/shared space. It can't be done through GitHub because their state includes sensitive information (API keys?), but with DVC is only checksums paired with files. It would be even simpler if we move to a prefix based approach instead of a path based one, since directories wouldn't have special treatments. |
Beta Was this translation helpful? Give feedback.
-
Best practice - files are editable by humans only. No software writes in files that goes under Git control. If software needs to write something and put under Git control it is better to localize the places when the modification happens.
@MrOutis are you talking about Terraform Cloud? If so, it seems like a different use case that can be implemented on top.
Is it only about the internal, code redesign? Yeah, the separation is needed. |
Beta Was this translation helpful? Give feedback.
-
Now all the checksums are scattered among DVC-files. It was a design decision to simplify
git merge
for ML experiments when a single data-file/dvc-stage changes were localized. However, we learned that in many cases-X theirs
strategy is the best way to bring ML experiments to another branch without a manual merging and it is a good time to revisit this design decision.There are two issues with checksums in many DVC-files:
dvc repro
). The changes in repo (changed dvc-files) need to be copied to somewhere (e.g. GitLab artifacts).To solve the issues from the above - it might worth to extract all the checksums into a separate "State"-file. For example:
Dvc.state
or<anyname>.dvcstate
or.dvc/state
Note, this is not the same as the current
.dvc/state
which is an ephemeral (not committed to Git) DB file. The state file needs to be committed to Git.Example: Terraform keeps all the infrastructure configuration in
*.tf
files but stores state in a single, separate fileterraform.tfstate
.Related issues: This FR might be related to a single dag FR #1871
Beta Was this translation helpful? Give feedback.
All reactions