Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing retry mechanism for failed worker connection #1158

Open
david1542 opened this issue Jan 6, 2025 · 1 comment
Open

Implementing retry mechanism for failed worker connection #1158

david1542 opened this issue Jan 6, 2025 · 1 comment

Comments

@david1542
Copy link

Hey, I'm trying to implement a simple retry mechanism with tenacity in Python that will retry failed connection to Hatchet cloud.

The reason I need retries is because we're running our workers in Google Cloud Run and when we deploy new versions, creates a new revision with new workers before killing the old workers. Due to that, it causes more workers to live during deployments, which can cause unexpected behaviours (for example exceeding the workers limit in Hatchet Cloud).

I've tried to do the following:

from tenacity import retry, stop_after_delay, wait_exponential

@retry(
    stop=stop_after_delay(300),  # 300 seconds = 5 minutes
    wait=wait_exponential(multiplier=1, min=4, max=10),
)
def run_worker():
    worker = hatchet.worker("data-miner")
    worker.register_workflow(ExtractDataWorkflow())
    worker.register_workflow(MineDataWorkflow())
    worker.start()

However, seems like worker.start() doesn't throw an exception, causing the function to hang.

Do you know what's the recommended way to achieve something like this?

@monkfromearth
Copy link

Did you figure out a way to achieve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants