Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better control over jobs number for dvc add #8030

Closed
wants to merge 8 commits into from

Conversation

percevalw
Copy link

Hi!

This PR follows iterative/dvc-objects#99 and partially fixes #8008.

--jobs is now allowed without a remote in dvc add, and a new configuration parameter core.jobs is now available to change the default number of threads during data transfers.

This also requires updating the docs, especially https://github.com/iterative/dvc.org/blob/main/content/docs/command-reference/add.md and https://github.com/iterative/dvc.org/blob/main/content/docs/command-reference/config.md.

@percevalw percevalw requested a review from a team as a code owner July 20, 2022 13:02
@percevalw percevalw requested a review from pmrowla July 20, 2022 13:02
@percevalw percevalw force-pushed the add_jobs branch 2 times, most recently from 653c9d1 to a6be52c Compare July 20, 2022 16:53
@dtrifiro
Copy link
Contributor

Thanks a lot for contributing! 🎉

@dtrifiro dtrifiro self-requested a review July 20, 2022 17:17
@efiop
Copy link
Contributor

efiop commented Jul 25, 2022

@efiop efiop closed this Jul 25, 2022
@percevalw
Copy link
Author

Hi @efiop, I'm not sure why this PR was closed? You mentioned here that iterative/dvc-objects#99 (comment) that the --jobs was indeed not passing through the various calls, and it also appears that --jobs is not available to control the number of threads used during a transfer without remote for dvc add.

It is therefore currently not possible to control the number of jobs used to transfer data locally during dvc add (always 4 * the number of cores), nor is it possible to change the default number of jobs for other commands in a given environment (the --jobs option has to be set).

@maharjun
Copy link

maharjun commented May 2, 2023

Please fix this? I've had to patch the threading issue manually for a while now. It would be nice to not have to do it considering it's been more than a year

@dberenbaum
Copy link
Collaborator

Sorry for the lack of noise here. Please feel free to open an issue, which is more likely to be seen than a closed PR.

@efiop Do you remember why we closed this PR?

@efiop
Copy link
Contributor

efiop commented May 10, 2023

Hm, maybe I've closed this by mistake, I don't remember the context now, but it certainly looks so.

@efiop efiop reopened this May 10, 2023
@efiop
Copy link
Contributor

efiop commented May 10, 2023

@percevalw I'm really sorry for this, looks like I might've misclicked and closed by accident and then your message notification got lost in the flow 🙁

I've solved some conflicts, let's see if this will work.

@efiop efiop self-assigned this May 10, 2023
@efiop
Copy link
Contributor

efiop commented May 10, 2023

Looks like something is missing, need to take a closer look.

@efiop
Copy link
Contributor

efiop commented Sep 26, 2023

We replaced --jobs with --remote-jobs in 3.0 and also on closer consideration core.jobs should not be the default for remotes, that will affect push/pull operations and is undesirable. I think #8008 needs a closer proper look.

Though core.jobs is questionable, --jobs for dvc add is much less so, so maybe that's what we need/want here. But there is a question if this should really be called --jobs or maybe something else (--cache-jobs since we have --remote-jobs already and then maybe cache.jobs as well?).

The issue that is being solved here is related to nfs + many cpus and we should probably think about it carefully.

@skshetry
Copy link
Member

skshetry commented Dec 8, 2023

I am going to close this PR, as this is stale at this point, and we plan to re-implement dvc add from scratch for index migration in #9333.

@skshetry skshetry closed this Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

add: taking more than 20min when multithreaded vs 20s with one job
6 participants