Skip to content

Thread pool dies and does not recover #183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
0xtavian opened this issue Apr 16, 2025 · 2 comments
Closed

Thread pool dies and does not recover #183

0xtavian opened this issue Apr 16, 2025 · 2 comments

Comments

@0xtavian
Copy link
Contributor

0xtavian commented Apr 16, 2025

Thread pool dies and does not recover

I’ve encountered an issue where Interlace’s thread pool gradually dies off and never recovers, eventually slowing down execution to a crawl (only a couple of threads left running) — even when commands do not crash.

Reproduction Steps

  1. Create a large list of targets:

    seq 1 1000000 | sed 's/^/example-/' > targets.txt
  2. Create a minimal script called crashy.sh (no crash, just sleep):

    #!/bin/bash
    sleep 2
  3. Make it executable:

    chmod +x crashy.sh
  4. In one terminal, sleep for 2 secs and then start monitoring thread activity:

    sleep 2 ; top -H -p $(pgrep -fa interlace | cut -d ' ' -f1)
  5. In a second terminal, quickly run:

    interlace -tL targets.txt -c './crashy.sh _target_' -threads 15
  6. Optionally, monitor active subprocesses:

    ps aux | grep ./crashy.sh | wc -l

Observed Behavior

  • Interlace starts with the expected number of threads (-threads 15).
  • Over time, the number of active threads drops.
  • Threads are not restarted, even though the commands are completing successfully.
  • It doesn't seem like all threads die off but many of them do, slowing down progress to a crawl.

Expected Behavior

  • Interlace should maintain the requested concurrency level until all tasks are completed.
  • Completed or idle threads should be reused or restarted as needed.

Let me know if I can help test any fix or branch!

@codingo @prodigysml

@0xtavian
Copy link
Contributor Author

The issue is obvious by looking at the terminal output. Notice each batch of commands gets progressively smaller.

$ interlace -tL targets.txt -c './crashy.sh _target_' -threads 15
=====================================================
Interlace v1.9.8	by Michael Skelton (@codingo_)
                  	& Sajeeb Lohani (@sml555_)
=====================================================
  0%|                                                                                                                                                       | 0/1000000 [00:00<?, ?it/s][20:12:41] [THREAD] [./crashy.sh example-779739] Added to Queue 
[20:12:41] [THREAD] [./crashy.sh example-932682] Added to Queue 
[20:12:41] [THREAD] [./crashy.sh example-960067] Added to Queue 
[20:12:41] [THREAD] [./crashy.sh example-242130] Added to Queue 
[20:12:41] [THREAD] [./crashy.sh example-442080] Added to Queue 
[20:12:41] [THREAD] [./crashy.sh example-790320] Added to Queue 
[20:12:41] [THREAD] [./crashy.sh example-45603] Added to Queue 
[20:12:41] [THREAD] [./crashy.sh example-290797] Added to Queue 
[20:12:41] [THREAD] [./crashy.sh example-63239] Added to Queue 
[20:12:41] [THREAD] [./crashy.sh example-903309] Added to Queue 
[20:12:41] [THREAD] [./crashy.sh example-339516] Added to Queue 
[20:12:41] [THREAD] [./crashy.sh example-403778] Added to Queue 
[20:12:41] [THREAD] [./crashy.sh example-437373] Added to Queue 
[20:12:43] [THREAD] [./crashy.sh example-312922] Added to Queue 
  0%|                                                                                                                                           | 14/1000000 [00:02<39:46:49,  6.98it/s][20:12:43] [THREAD] [./crashy.sh example-720396] Added to Queue 
[20:12:43] [THREAD] [./crashy.sh example-963604] Added to Queue 
[20:12:43] [THREAD] [./crashy.sh example-814407] Added to Queue 
[20:12:43] [THREAD] [./crashy.sh example-18763] Added to Queue 
[20:12:43] [THREAD] [./crashy.sh example-91205] Added to Queue 
[20:12:43] [THREAD] [./crashy.sh example-38593] Added to Queue 
[20:12:43] [THREAD] [./crashy.sh example-90458] Added to Queue 
[20:12:43] [THREAD] [./crashy.sh example-152917] Added to Queue 
[20:12:43] [THREAD] [./crashy.sh example-38394] Added to Queue 
[20:12:45] [THREAD] [./crashy.sh example-290602] Added to Queue 
  0%|                                                                                                                                           | 24/1000000 [00:04<47:53:17,  5.80it/s][20:12:45] [THREAD] [./crashy.sh example-358735] Added to Queue 
[20:12:45] [THREAD] [./crashy.sh example-360058] Added to Queue 
[20:12:45] [THREAD] [./crashy.sh example-443152] Added to Queue 
[20:12:45] [THREAD] [./crashy.sh example-543509] Added to Queue 
[20:12:45] [THREAD] [./crashy.sh example-691332] Added to Queue 
[20:12:45] [THREAD] [./crashy.sh example-901985] Added to Queue 
[20:12:45] [THREAD] [./crashy.sh example-549403] Added to Queue 
[20:12:45] [THREAD] [./crashy.sh example-916358] Added to Queue 
[20:12:45] [THREAD] [./crashy.sh example-404391] Added to Queue 
[20:12:47] [THREAD] [./crashy.sh example-729729] Added to Queue 
  0%|                                                                                                                                           | 34/1000000 [00:06<51:12:18,  5.42it/s][20:12:47] [THREAD] [./crashy.sh example-953625] Added to Queue 
[20:12:47] [THREAD] [./crashy.sh example-210618] Added to Queue 
[20:12:47] [THREAD] [./crashy.sh example-668775] Added to Queue 
[20:12:47] [THREAD] [./crashy.sh example-950157] Added to Queue 
[20:12:47] [THREAD] [./crashy.sh example-545760] Added to Queue 
[20:12:47] [THREAD] [./crashy.sh example-615597] Added to Queue 
[20:12:47] [THREAD] [./crashy.sh example-760674] Added to Queue 
[20:12:47] [THREAD] [./crashy.sh example-368038] Added to Queue 
[20:12:49] [THREAD] [./crashy.sh example-516235] Added to Queue 
  0%|                                                                                                                                           | 43/1000000 [00:08<54:59:43,  5.05it/s][20:12:49] [THREAD] [./crashy.sh example-326239] Added to Queue 
[20:12:49] [THREAD] [./crashy.sh example-65245] Added to Queue 
[20:12:49] [THREAD] [./crashy.sh example-173247] Added to Queue 
[20:12:49] [THREAD] [./crashy.sh example-543505] Added to Queue 
[20:12:49] [THREAD] [./crashy.sh example-856399] Added to Queue 
[20:12:49] [THREAD] [./crashy.sh example-172647] Added to Queue 
[20:12:51] [THREAD] [./crashy.sh example-287329] Added to Queue 
  0%|                                                                                                                                           | 50/1000000 [00:10<61:57:38,  4.48it/s][20:12:51] [THREAD] [./crashy.sh example-737493] Added to Queue 
[20:12:51] [THREAD] [./crashy.sh example-822885] Added to Queue 
[20:12:51] [THREAD] [./crashy.sh example-375414] Added to Queue 
[20:12:51] [THREAD] [./crashy.sh example-518535] Added to Queue 
[20:12:51] [THREAD] [./crashy.sh example-410068] Added to Queue 
[20:12:51] [THREAD] [./crashy.sh example-793489] Added to Queue 
[20:12:53] [THREAD] [./crashy.sh example-176067] Added to Queue 
  0%|                                                                                                                                           | 57/1000000 [00:12<67:04:35,  4.14it/s][20:12:53] [THREAD] [./crashy.sh example-167622] Added to Queue 
[20:12:53] [THREAD] [./crashy.sh example-76189] Added to Queue 
[20:12:53] [THREAD] [./crashy.sh example-609092] Added to Queue 
[20:12:53] [THREAD] [./crashy.sh example-300403] Added to Queue 
[20:12:53] [THREAD] [./crashy.sh example-267769] Added to Queue 
[20:12:53] [THREAD] [./crashy.sh example-574668] Added to Queue 
[20:12:55] [THREAD] [./crashy.sh example-287193] Added to Queue 
  0%|                                                                                                                                           | 64/1000000 [00:14<70:47:04,  3.92it/s][20:12:55] [THREAD] [./crashy.sh example-580299] Added to Queue 
[20:12:55] [THREAD] [./crashy.sh example-40423] Added to Queue 
[20:12:55] [THREAD] [./crashy.sh example-194611] Added to Queue 
[20:12:55] [THREAD] [./crashy.sh example-751680] Added to Queue 
[20:12:55] [THREAD] [./crashy.sh example-75713] Added to Queue 
[20:12:55] [THREAD] [./crashy.sh example-157867] Added to Queue 
[20:12:56] [THREAD] [./crashy.sh example-207364] Added to Queue 
  0%|                                                                                                                                           | 71/1000000 [00:14<60:48:43,  4.57it/s]

🧵 The Core Problem: Threads Were Dying Silently Over Time


🔍 Root Cause # 1: Use of a Shared Generator (next(...)) Between Threads

  • The original Pool handed each Worker an iterator.
  • Each thread called next(task_queue) to get its task.
  • But Python generators are not thread-safe.
  • If one thread advanced the iterator faster than others, it could exhaust the generator prematurely.
  • Other threads hitting StopIteration would exit early, assuming no tasks remain.

🧨 Result: Threads exited early, even if work was still available.


🔥 Root Cause # 2: Uncaught Exceptions in task.run()

  • If a task failed (e.g. subprocess.run(...) exited non-zero), and the exception wasn't caught:
    • The thread running it would crash.
  • Python’s ThreadPoolExecutor does not restart threads when they die.

🧨 Result: A single failed task could permanently kill a worker thread.


⚠️ Combo Effect

  • A few threads crash due to task failures.
  • Others exit because the generator was exhausted.
  • Over time, fewer and fewer threads are alive.
  • Eventually, only 1–2 threads remain to do all the work.

🚨 Result: Execution becomes slow, unbalanced, and frustratingly inefficient.


✅ The Fix

I’ve submitted a pull request that addresses both of these issues #184:

  • Replaces the shared generator with a thread-safe queue.Queue() to ensure consistent task distribution
  • Catches exceptions in Worker.__call__() to prevent threads from crashing
  • Moves [THREAD] Added to Queue log into the thread logic, so it reflects actual execution time

This change maintains compatibility with the existing CLI and preserves the original structure with minimal code changes.

@0xtavian
Copy link
Contributor Author

Thanks for merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant