-
Notifications
You must be signed in to change notification settings - Fork 1
Ingestion worker quit unexpectedly after many ingestions #285
Comments
I noticed I had put that assertion in place: This was useful when, in the dev cycle, we could pass anything. |
Is it still the case? |
Actually, yes, I've noticed such behavior just today while testing the PR. I haven't checked the exception though. But |
It might be the keyserver then. Can you show the exceptions? |
When I'll catch it once more, I'll post it here. It happens randomly - floating bug. |
Here we go:
|
The connection to the local broker was closed abruptly. Do you know what closed it? We could write the code so that it retries. That'll guard us from the broken connection. In my mind, this code is not embedded in the application but is placed outside, at the level of the manager, like kubernetes or docker. |
Well, this is one of the scenarios, when I shut down the broker and then start it again. But in most of the attempts ingestion worker doesn't exit after that, so looks like it's able to recover connection. But apparently not always... |
When the broker is killed, the ingestion does indeed die. And the container solution is: "Restart a new one". So nope, the old one won't restart and recover. This was not in the design. |
Can it possibly be caused by |
Following this issue on pika pika/pika#856 led me to this pika/pika#858 ... might be worth a shot
here: https://github.com/NBISweden/LocalEGA/blob/dev/lega/utils/amqp.py#L112-L118 |
I've been running ingestion scenario many times using the test-suite. After 30+ times, ingestion stopped working. I noticed that ingestion container died, here's the last part of it's logs:
The text was updated successfully, but these errors were encountered: