-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OF-2966: Fix race condition in BOSH (noticable during logout) #2681
base: main
Are you sure you want to change the base?
Conversation
BOSH related events (mostly: processing of received stanzas) are executed by a worker pool of threads. Events for each individual session must be processed in the same order in which they are received. Prior to this change, every event was potentially executed by a different worker thread. This can cause some events to be processed (or finish processing) before an earlier event finishes execution in another worker thread. This commit adds FIFO-based queuing of events, that ensures that events that belong to the same session are processed: - by only one thread from the worker pool; - in the same order in which they're received. The technique applied here is inspired (well, just bluntly copied) from Ben Manes' work in https://github.com/ben-manes/caffeine/blob/master/jcache/src/main/java/com/github/benmanes/caffeine/jcache/event/EventDispatcher.java
In the previous commit, a mechanism was introduced that ensures per-session FIFO processing of events, using a queuing mechanism based on Futures. This commit adds a periodic check to see if any events are queued for sessions that no longer exist, discarding those events if they're found. The cleanup is intended as a safeguard measure against slow accumulation of non-executable events, which over time could lead to memory starvation.
By having the 'close' routine be executed by the same worker threads that process regular BOSH data, the processing order of all of these will make use of the FIFO mechanism introduced in the last two commits. This helps ensure that a 'close' (terminate) event is processed only after stanzas that were received earlier have been processed - assuming that this processing is synchronous.
Removed redundant 'if is-log-level-is-enabled' checks. Used parameterized logging statements when composing a log message.
2686f99
to
0952fc3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this makes sense to me, and all looks like it should work.
How can we prove it though?
Looks like you've taken care of performance in terms of memory usage - any concerns about speed?
The JIRA ticket lists a manual modification to Openfire that I used to reproduce a problem that's fixed by this PR. I can reproduce the issue 100% without this change, and not at all with this change. I don't have a specific concern about speed. This surely adds some overhead, but I expect that to be negligible. |
👍 |
The first three commits in this PR build towards a fix for a race condition, in which terminated BOSH clients can leave Openfire in an inconsistent state (and, in certain circumstances, lead to memory starvation). See https://igniterealtime.atlassian.net/browse/OF-2966 for details.