[BUG Fix] Launching dependent `LocalPipelineExecutor`s with `skip_completed=False` lead to waiting #300

silverriver · 2024-10-30T02:28:20Z

When launching dependent LocalPipelineExecutor, using the flag skip_completed=False in previous executor will lead to the following exector wait forever.

For example:

executor1 = LocalPipelineExecutor(
    pipeline=[
            ...
        ],
    tasks=10,
    logging_dir=f"logs/tokz",
    skip_completed=False
)

executor2 = LocalPipelineExecutor(
    pipeline=[
            ...
        ],
    tasks=10,
    logging_dir=f"logs/tokz",
)

if __name__ == "__main__":
    executor2.run()

The above code snippet will lead to

datatrove.executor.local:run:102 - Dependency job still has 10/10 tasks. Waiting...

even if executor1 has finished all its jobs.

add a new param for get_incomplete_ranks to override the default behaviour.

Get the *real* incomplete task count with launching dependent executors.

silverriver · 2024-10-30T11:21:47Z

at @hynky1999 to get more attention

guipenedo · 2024-10-30T14:57:36Z

Thanks for the PR!
The idea of skip_completed is if you want to re-run the already completed tasks, so I believe the behavior from the second block is working as expected. If you don't want it to wait at all then just launch without the dependency

Also, no need to ping maintainers, we look into PRs and issues when we have time

silverriver · 2024-10-30T15:15:53Z

The second block will wait forever.

I think the correct/intended behavior of skip_completed is that: if we set skip_completed=False for the first block, then the first block will be re-runned and then the second block will be launched after the re-run is finished.

However, the current behavior is that: if we set skip_completed=False for the first block, then the first block will be re-runned, but the second block will be stucked in a waiting loop and never be launched. The reason is that executor1 .get_incomplete_ranks() will always return 10 tasks whenever it is called.

Thanks for the PR! The idea of skip_completed is if you want to re-run the already completed tasks, so I believe the behavior from the second block is working as expected. If you don't want it to wait at all then just launch without the dependency

Also, no need to ping maintainers, we look into PRs and issues when we have time

silverriver added 2 commits October 30, 2024 10:07

Update base.py

ee5d55f

add a new param for get_incomplete_ranks to override the default behaviour.

Update local.py

20d29a2

Get the *real* incomplete task count with launching dependent executors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG Fix] Launching dependent `LocalPipelineExecutor`s with `skip_completed=False` lead to waiting #300

[BUG Fix] Launching dependent `LocalPipelineExecutor`s with `skip_completed=False` lead to waiting #300

silverriver commented Oct 30, 2024

silverriver commented Oct 30, 2024

guipenedo commented Oct 30, 2024 •

edited

Loading

silverriver commented Oct 30, 2024

[BUG Fix] Launching dependent LocalPipelineExecutors with skip_completed=False lead to waiting #300

Are you sure you want to change the base?

[BUG Fix] Launching dependent LocalPipelineExecutors with skip_completed=False lead to waiting #300

Conversation

silverriver commented Oct 30, 2024

silverriver commented Oct 30, 2024

guipenedo commented Oct 30, 2024 • edited Loading

silverriver commented Oct 30, 2024

[BUG Fix] Launching dependent `LocalPipelineExecutor`s with `skip_completed=False` lead to waiting #300

[BUG Fix] Launching dependent `LocalPipelineExecutor`s with `skip_completed=False` lead to waiting #300

guipenedo commented Oct 30, 2024 •

edited

Loading