Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

register workers of scheduler are less than workers in dashborad #148

Open
FANLONGFANLONG opened this issue Jul 5, 2021 · 1 comment
Open

Comments

@FANLONGFANLONG
Copy link

What happened:

  • we used the below way to start dask cluster with 20 workers
   n_workers = 20
    spec_worker = skein.Service(resources=skein.Resources(memory=memory_limit, vcores=threads_per_worker),
                                instances=n_workers,
                                files=resource_files,
                                script=worker_script)
    services["dask.worker"] = spec_worker
   spec_scheduler = skein.Service(resources=skein.Resources(memory="20g", vcores=4),
                                       files=resource_files,
                                       instances=1,
                                       script=scheduler_script)
    services["dask.scheduler"] = spec_scheduler

    spec = skein.ApplicationSpec(name=app_name,
                                 queue=_get_queue(),
                                 master=spec_master,
                                 services=services)

    cluster = YarnCluster.from_specification(spec)
  • scheduler log showed only few worker registered
    image
  • but we can see all workers in dashborad
    image

What you expected to happen:
all of 20 workers registered into scheduler

Anything else we need to know?:

  • sometime scheduler showed all of 20 workers registered, but sometime not.
  • submitted task always ran into issue as below when scheduler showed only part of 20 workers registered
    image

Environment:

  • Dask version: 2.19.0
  • Python version: 3.6.7
  • Operating System: CentOS Linux release 7.2.1511 (Core)
  • Install method (conda, pip, source): conda
@jacobtomlinson jacobtomlinson transferred this issue from dask/distributed Jul 6, 2021
@jacobtomlinson
Copy link
Member

This seems to be related to how dask-yarn exposes scheduler logging so moving the issue over there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants