batch dump and batch update of old stages #10630
Open
+85
−31
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Rather than running stage.dump separately for each stage, which involves loading the dvc.lock file many times, this proposed code groups stages together by their dvcfiles and does a
dvcfile.batch_dump
(see thestages_by_dvcfile
object).Also, rather than getting the old version of each stage within
get_versioned_outs
, this proposed code gets all old versions in one step, stores them in aold_stages
object, and then passes each old stage to stage.save. This similarly removes a step where the lock file is reloaded once per stage.My understanding is that these changes improve the speed of dvc commit when the dvc.lock file is very large and slow to load, without causing any changes or problems regarding the function's behavior. That said, I am not 100%, especially when it comes to the get_versioned_outs part which I have less context for. I would appreciate feedback about what this could be missing!
Related issues
Fixes #10629, although it seems like it would be even better to have separate lock files.
Checklist