-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sporadic iRODS errors in CI after iRODS 4.3 env updates #2029
Comments
Example of one of the many identical test failures:
|
I totally forgot to check out the container output in the CI job shutdown phase. That tells us that iRODS is correctly initialized, but we have a bunch of "too many clients" errors from Postgres:
This would suggest something is trying to open too many connections with postgres. I've seen something similar when I had bugs in closing connections from the python-irodsclient and iRODS sessions were left open. However, those bugs have long since been fixed so this should not be an issue. And if it was, it would show up consistently. I don't think the Django server would end up in any situation where it would flood postgres with open connections. At least I have never encountered that, even with heavy use in production. It is also possible that this is the symptom, not the cause: something else goes wrong and postgres ends up being flooded with opened connections? |
Full log of the failed job here: ci_irods_crash.txt I'll have to look into whether some clues can be derived from this. It would appear like none of the taskflow tests succeed. Perhaps they fail in a way which also leaves postgres connections open. Another thing of note is the following error I don't recall encountering before. That could be a symptom and not the problem though. I'll have to look at this log file in detail later, currently too busy with other stuff. |
I'm looking into the full crash log in detail, writing down some facts and thoughts here. Facts and Observations
Error Types Raised
Conclusions
Since I currently have no way of reproducing this, I'll have to wait for this to happen again and compare the log to this one. Another possibility would be to try running this in dev until the problem appears and then look at the |
Possibly related: my latest local test run resulted with a bunch of timeouts messages logged while running tests in
However, all the tests executed successfully. After running the tests, an expected amount of sessions were open in I should probably set the log level on the test server to |
I'm setting this as If similar server freezeups are observed in staging or production, this will of course get re-prioritized higher. |
I have not noticed these for a while anymore with the current build of our image. Still monitoring the situation, but I'm hopeful. |
Earlier today I upgraded to
python-irodsclient==2.2.0
(#2023), mainly because the release was supposed to fix the redirect problem I was previously expecting.The initial test run worked fine. However, a subsequent commit which only contained documentation changes had a large number (all?) of iRODS tests failing with connection errors. Example in comments below.
My guesses:
SyntaxError
is probably not intentional at leastI'll try to gather more information and possibly contact iRODS support.
The text was updated successfully, but these errors were encountered: