-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DB-702 Added release hook to RecvByteBufAllocator#Handle. #9
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline @sbtourist, we can merge this but it will require more work in order to remove the Netty fork later on.
We will probably have to also move InboundByteBufAllocator
and call release from the various unsafe implementations when the channel is closed. This is assuming they will accept this change as they may argue that the same can be achieved by their pooled allocator and if there is any performance improvement to be made it should be done in the allocator. Not something we want to tackle I think. The change also relies on the downstream consumers not keeping references to this large buffer, something that is true in our case but maybe not generally true.
These are the alternatives I could think of:
- Wire a new channel aware allocator in our
InboundInitializer
. This has the advantage of being entirely Apollo side but, as discussed, it is risky since we cannot guarantee that the unsafe hasn't already created the handle yet. It shouldn't at the moment but this might change in future. - Pass the channel to
newHandle()
. There are currently extra invocations with no access to the channel but I think these only exist just to check that the handle is an extended handle, it may be that these will disappear in Netty 5 and we could just pass null for the time being. The real invocations in the unsafe implementations should have access to the channel when they create the handle. We could offer to clear up the extended handle code if it is no longer required in Netty 5 and add the channel tonewHandle()
. This would be a reasonably clean change and unrelated to performance, so easier to contribute. - Expose the channel in
ChannelConfig
. This is a trivial change in Netty, and therefore more likely to be accepted. It simply requires adding a channel getter toDefaultChannelConfig
andChannelConfig
since all config concrete classes extendDefaultChannelConfig
and this class already owns a channel. We can then register a listener with the channel close future the first time thatreset
is called. Still a slight implementation dependency but a relatively safe dependency, since thereset
is an integral part of the handle logic.
Thanks for your analysis @stef1927. I spent some time looking at the alternatives you mentioned and I don't think those make things better, specifically:
OTOH, the proposed patch is just about adding a new interface method, and I think there are good chances it will be accepted even without contributing Thoughts? |
I am probably negatively biased because of my experience in OS Cassandra, but why would an open source project add a method to an interface just to benefit a single company if that method doesn't make any sense otherwise? Especially a method called Our current implementation of releasing the handle from As I said offline, it is OK to merge for the time being but later we'll have to do more work on this. The hope is that by the time the AIO reads have been merged into Netty 5, we'll have a better idea of what Netty 5 will look like, and hence it should be easier to work out the best approach. If we do keep this approach, then every time a handle is created by Netty, it should be released. |
Netty is different then Cassandra: the former is a library, and as such it's common (in my experience) to provide extension points. Regarding the method name, that's a good point: what about renaming as
The following list of concerns is definitely valid, but they are implementation concerns unrelated to the interface, i.e.:
Correct, and totally my fault, but this can be fixed easily either as proposed above by the Netty side or by adding a channel close listener on our side.
I do not think there is much we can do about such changes: the best thing is probably writing a test to verify assumptions are not violated.
I do not want to rush this if we don't agree on a decent solution: you're voicing valid concerns, would the change proposed above help against them? |
It would definitely help to release the handle in the unsafe. I think you're right, it should be called where it calls Regarding the name, either What you mentioned on libraries accommodating extension points gave me another idea. The reason why we cannot just do an |
The problem with that is I'd rather stick with the previously proposed approach if that sounds ok to you. |
OK |
@stef1927, I've amended the PR as discussed. I wanted to add a test, but for some reason it fails to compile due to deprecation warnings ... it almost drove me crazy, so I gave up and manually verified it (we can talk about it more offline). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes. They LGTM with only one small point.
Manual verification is sufficient, but if you still have the test and want me to try and see if I can understand why the build is complaining, I'm happy to.
@@ -801,6 +801,10 @@ private void deregister(final ChannelPromise promise, final boolean fireChannelI | |||
return; | |||
} | |||
|
|||
if (recvHandle != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also check on fireChannelInactive
because in theory it is possible to deregister a channel without closing it.
…tor (netty#10890) Motivation: A race detector discovered a data race in GlobalEventExecutor present in netty 4.1.51.Final: ``` Write of size 4 at 0x0000cea08774 by thread T103: #0 io.netty.util.internal.DefaultPriorityQueue.poll()Lio/netty/util/internal/PriorityQueueNode; DefaultPriorityQueue.java:113 #1 io.netty.util.internal.DefaultPriorityQueue.poll()Ljava/lang/Object; DefaultPriorityQueue.java:31 #2 java.util.AbstractQueue.remove()Ljava/lang/Object; AbstractQueue.java:113 #3 io.netty.util.concurrent.AbstractScheduledEventExecutor.pollScheduledTask(J)Ljava/lang/Runnable; AbstractScheduledEventExecutor.java:133 #4 io.netty.util.concurrent.GlobalEventExecutor.fetchFromScheduledTaskQueue()V GlobalEventExecutor.java:119 #5 io.netty.util.concurrent.GlobalEventExecutor.takeTask()Ljava/lang/Runnable; GlobalEventExecutor.java:106 #6 io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run()V GlobalEventExecutor.java:240 #7 io.netty.util.internal.ThreadExecutorMap$2.run()V ThreadExecutorMap.java:74 #8 io.netty.util.concurrent.FastThreadLocalRunnable.run()V FastThreadLocalRunnable.java:30 #9 java.lang.Thread.run()V Thread.java:835 #10 (Generated Stub) <null> Previous read of size 4 at 0x0000cea08774 by thread T110: #0 io.netty.util.internal.DefaultPriorityQueue.size()I DefaultPriorityQueue.java:46 #1 io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run()V GlobalEventExecutor.java:263 #2 io.netty.util.internal.ThreadExecutorMap$2.run()V ThreadExecutorMap.java:74 #3 io.netty.util.concurrent.FastThreadLocalRunnable.run()V FastThreadLocalRunnable.java:30 #4 java.lang.Thread.run()V Thread.java:835 #5 (Generated Stub) <null> ``` The race is legit, but benign. To trigger it requires a TaskRunner to begin exiting and set 'started' to false, more work to be scheduled which starts a new TaskRunner, that work then needs to schedule additional work which modifies 'scheduledTaskQueue', and then the original TaskRunner checks 'scheduledTaskQueue'. But there is no danger to this race as it can only produce a false negative in the condition which causes the code to CAS 'started' which is thread-safe. Modifications: Delete problematic references to scheduledTaskQueue. The only way scheduledTaskQueue could be modified since the last check is if another TaskRunner is running, in which case the current TaskRunner doesn't care. Result: Data-race free code, and a bit less code to boot.
No description provided.