-
-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keepalived occasionally fails SSL_CHECK #2351
Comments
The man page for SSL_get_error() list a number of functions to which it applies, which includes SSL_shutdown(), and so I think ERR_clear_error() should also be called before the two calls of SSL_shutdown(). keepalived does not use any of the other functions listed. However, I think the patch will not work as a solution; in fact I think we have a severe problem. The SSL_get_error(3) man page states:
The last sentence says we need to ensure that the thread's error queue is empty before we call SSL_connect(), SSL_read(), SSL_write() or SSL_shutdown(), so either we must call ERR_clear_error() before any of the SSL functions, or after an error we need to ensure we read all the errors, which is a solution I prefer since we ought to process all the errors and not just throw them away. The problem is the second sentence above There are two possible solutions that I can see (but only 1 can work for keepalived):
The problem with option 2 is that (except for DBus) we do not use pthreads, and this would require a not insignificant architecture change for keepalived. I will think on about this and see if I can work out the best solution. It might need reading the openSSL code to really understand how it works, but I would rather avoid that. |
Describe the bug
I'm upgrading our servers from Debian Bullseye to Debian Bookworm. Some of them act as load balancers using keepalived.
Right now I have one Bullseye and one Bookworm with the same configuration checking the same services.
Several of our services are running on HTTPS therefore I'm using SSL_CHECK.
I can see that the Bookworm one occasionally fails SSL_CHECK for several seconds on one service while the
Bullseye does not report any problem at all.
I've seen there was RST packet sent from keepalived to the service when the check failed.
To Reproduce
That's hard to tell.
It's quite rare - not even once per hour with 1s loop delay with tens of real servers.
I would say services with longer certificate chain are more likely to fail.
Expected behavior
Don't fail SSL_CHECK when the service has no problems.
Keepalived version
Distro (please complete the following information):
Configuration file:
Notify and track scripts
System Log entries
Did keepalived coredump?
no
Additional context
I've reported Debian bug but the maintainer asked me to fill upstream issue first.
I was looking for possible reason and I've found
openssl/openssl#20365
pjsip/pjproject#3632
https://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error
They are all basically saying that you can have multiple SSL errors left in error queue and you are supposed to
run ERR_get_error() before calling SSL_* functions.
I was able to solve this issue with simple patch.
What do you think about it?
The text was updated successfully, but these errors were encountered: