Implementing retry mechanism for failed worker connection #1158

david1542 · 2025-01-06T18:37:00Z

Hey, I'm trying to implement a simple retry mechanism with tenacity in Python that will retry failed connection to Hatchet cloud.

The reason I need retries is because we're running our workers in Google Cloud Run and when we deploy new versions, creates a new revision with new workers before killing the old workers. Due to that, it causes more workers to live during deployments, which can cause unexpected behaviours (for example exceeding the workers limit in Hatchet Cloud).

I've tried to do the following:

from tenacity import retry, stop_after_delay, wait_exponential

@retry(
    stop=stop_after_delay(300),  # 300 seconds = 5 minutes
    wait=wait_exponential(multiplier=1, min=4, max=10),
)
def run_worker():
    worker = hatchet.worker("data-miner")
    worker.register_workflow(ExtractDataWorkflow())
    worker.register_workflow(MineDataWorkflow())
    worker.start()

However, seems like worker.start() doesn't throw an exception, causing the function to hang.

Do you know what's the recommended way to achieve something like this?

monkfromearth · 2025-02-25T06:04:12Z

Did you figure out a way to achieve this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing retry mechanism for failed worker connection #1158

Implementing retry mechanism for failed worker connection #1158

david1542 commented Jan 6, 2025

monkfromearth commented Feb 25, 2025

Implementing retry mechanism for failed worker connection #1158

Implementing retry mechanism for failed worker connection #1158

Comments

david1542 commented Jan 6, 2025

monkfromearth commented Feb 25, 2025