Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG Fix] Launching dependent LocalPipelineExecutors with skip_completed=False lead to waiting #300

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

silverriver
Copy link
Contributor

When launching dependent LocalPipelineExecutor, using the flag skip_completed=False in previous executor will lead to the following exector wait forever.

For example:

executor1 = LocalPipelineExecutor(
    pipeline=[
            ...
        ],
    tasks=10,
    logging_dir=f"logs/tokz",
    skip_completed=False
)

executor2 = LocalPipelineExecutor(
    pipeline=[
            ...
        ],
    tasks=10,
    logging_dir=f"logs/tokz",
)

if __name__ == "__main__":
    executor2.run()

The above code snippet will lead to

datatrove.executor.local:run:102 - Dependency job still has 10/10 tasks. Waiting...

even if executor1 has finished all its jobs.

add a new param for get_incomplete_ranks to override the default behaviour.
Get the *real* incomplete task count with launching dependent executors.
@silverriver
Copy link
Contributor Author

at @hynky1999 to get more attention

@guipenedo
Copy link
Collaborator

guipenedo commented Oct 30, 2024

Thanks for the PR!
The idea of skip_completed is if you want to re-run the already completed tasks, so I believe the behavior from the second block is working as expected. If you don't want it to wait at all then just launch without the dependency

Also, no need to ping maintainers, we look into PRs and issues when we have time

@silverriver
Copy link
Contributor Author

The second block will wait forever.

I think the correct/intended behavior of skip_completed is that: if we set skip_completed=False for the first block, then the first block will be re-runned and then the second block will be launched after the re-run is finished.

However, the current behavior is that: if we set skip_completed=False for the first block, then the first block will be re-runned, but the second block will be stucked in a waiting loop and never be launched. The reason is that executor1 .get_incomplete_ranks() will always return 10 tasks whenever it is called.

Thanks for the PR! The idea of skip_completed is if you want to re-run the already completed tasks, so I believe the behavior from the second block is working as expected. If you don't want it to wait at all then just launch without the dependency

Also, no need to ping maintainers, we look into PRs and issues when we have time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants