DB-702 Added release hook to RecvByteBufAllocator#Handle. #9

sbtourist · 2018-06-01T18:47:50Z

No description provided.

stef1927

As discussed offline @sbtourist, we can merge this but it will require more work in order to remove the Netty fork later on.

We will probably have to also move InboundByteBufAllocator and call release from the various unsafe implementations when the channel is closed. This is assuming they will accept this change as they may argue that the same can be achieved by their pooled allocator and if there is any performance improvement to be made it should be done in the allocator. Not something we want to tackle I think. The change also relies on the downstream consumers not keeping references to this large buffer, something that is true in our case but maybe not generally true.

These are the alternatives I could think of:

Wire a new channel aware allocator in our InboundInitializer. This has the advantage of being entirely Apollo side but, as discussed, it is risky since we cannot guarantee that the unsafe hasn't already created the handle yet. It shouldn't at the moment but this might change in future.
Pass the channel to newHandle(). There are currently extra invocations with no access to the channel but I think these only exist just to check that the handle is an extended handle, it may be that these will disappear in Netty 5 and we could just pass null for the time being. The real invocations in the unsafe implementations should have access to the channel when they create the handle. We could offer to clear up the extended handle code if it is no longer required in Netty 5 and add the channel to newHandle(). This would be a reasonably clean change and unrelated to performance, so easier to contribute.
Expose the channel in ChannelConfig. This is a trivial change in Netty, and therefore more likely to be accepted. It simply requires adding a channel getter to DefaultChannelConfig and ChannelConfig since all config concrete classes extend DefaultChannelConfig and this class already owns a channel. We can then register a listener with the channel close future the first time that reset is called. Still a slight implementation dependency but a relatively safe dependency, since the reset is an integral part of the handle logic.

sbtourist · 2018-06-05T10:49:30Z

Thanks for your analysis @stef1927.

I spent some time looking at the alternatives you mentioned and I don't think those make things better, specifically:

Wire a new channel aware allocator in our InboundInitializer: as discussed, this is race-prone, so not worth discussing more.
Pass the channel to newHandle(): this is not too bad, but as you mentioned, it requires passing null from those places that do not have access to a Channel, which is confusing, and the channel argument is never used anyway in all current implementations; so, if we're worrying to introduce an interface change that is not actually used, this is also the case.
Expose the channel in ChannelConfig: I don't like this at all to be honest; the config should be used to create the channel, hence it can exists without the latter, which means to me coupling the two would be bad design.

OTOH, the proposed patch is just about adding a new interface method, and I think there are good chances it will be accepted even without contributing InboundByteBufAllocator, in particular as Netty 5 will target Java8 and we will be able to add a default implementation, reducing the code impact to the minimum; we might need to call such new method when closing the channel in Unsafe, but that is a pretty small change.

Thoughts?

stef1927 · 2018-06-05T12:11:20Z

I am probably negatively biased because of my experience in OS Cassandra, but why would an open source project add a method to an interface just to benefit a single company if that method doesn't make any sense otherwise? Especially a method called release, which implies that every time a new handle is created, it should be released. At least we should take care of doing this.

Our current implementation of releasing the handle from InboundMessageHandler is in fact a bit brittle. What about during the handshake, isn't the buffer leaked if the handshake fails? We must remember to release it from other handlers too, but only from at most one handler in the pipeline. Also, what if there were an implementation of the unsafe that doesn't cache the handle? This would leak buffers. What if in Apollo we moved the allocation of the buffer to the constructor? Then all the extra invocations of newHandle() would leak the buffer.

As I said offline, it is OK to merge for the time being but later we'll have to do more work on this. The hope is that by the time the AIO reads have been merged into Netty 5, we'll have a better idea of what Netty 5 will look like, and hence it should be easier to work out the best approach. If we do keep this approach, then every time a handle is created by Netty, it should be released.

sbtourist · 2018-06-05T12:35:17Z

why would an open source project add a method to an interface just to benefit a single company if that method doesn't make any sense otherwise? Especially a method called release, which implies that every time a new handle is created, it should be released. At least we should take care of doing this.

Netty is different then Cassandra: the former is a library, and as such it's common (in my experience) to provide extension points. Regarding the method name, that's a good point: what about renaming as onClose() and calling it in the Unsafe implementation when the close listener is notified (and adding a test about such method being actually called and the handle instance being the same)?

Our current implementation of releasing the handle from InboundMessageHandler is in fact a bit brittle.

The following list of concerns is definitely valid, but they are implementation concerns unrelated to the interface, i.e.:

What about during the handshake, isn't the buffer leaked if the handshake fails? We must remember to release it from other handlers too, but only from at most one handler in the pipeline.

Correct, and totally my fault, but this can be fixed easily either as proposed above by the Netty side or by adding a channel close listener on our side.

what if there were an implementation of the unsafe that doesn't cache the handle? What if in Apollo we moved the allocation of the buffer to the constructor?

I do not think there is much we can do about such changes: the best thing is probably writing a test to verify assumptions are not violated.

As I said offline, it is OK to merge for the time being but later we'll have to do more work on this.

I do not want to rush this if we don't agree on a decent solution: you're voicing valid concerns, would the change proposed above help against them?

stef1927 · 2018-06-05T13:22:03Z

It would definitely help to release the handle in the unsafe. I think you're right, it should be called where it calls fireChannelInactiveAndDeregister() in the abstract unsafe or thereabout.

Regarding the name, either onChannelClose() or onChannelInactive() or onClose(channel) or onInactive(channel), something that implies the channel being closed, otherwise just onClose is a bit unclear and I'd rather keep release or close to indicate close the handle itself. The unsafe should then set the handle to null.

What you mentioned on libraries accommodating extension points gave me another idea. The reason why we cannot just do an instanceof and use a channel close listener is because of EpollRecvByteAllocatorHandle and other delegating implementations. So perhaps the correct extension method to add is a method that gives access to the real implementation of the handle. So a method that by default returns this but returns the delegate in the wrapper handles. This is perhaps easier and a more generic approach that would help with other scenarios too, not just those related to releasing resources.

sbtourist · 2018-06-05T13:56:59Z

the correct extension method to add is a method that gives access to the real implementation of the handle

The problem with that is EpollRecvByteAllocatorHandle doesn't extend DelegatingHandle, so that would require a more invasive change and cleaning up hierarchies just for the sake of making instanceof work, which is probably harder to justify (as you're not supposed to rely on instanceof, but you might call me a purist).

I'd rather stick with the previously proposed approach if that sounds ok to you.

stef1927 · 2018-06-05T13:58:23Z

OK

sbtourist · 2018-06-05T18:43:39Z

@stef1927, I've amended the PR as discussed. I wanted to add a test, but for some reason it fails to compile due to deprecation warnings ... it almost drove me crazy, so I gave up and manually verified it (we can talk about it more offline).

stef1927

Thanks for the changes. They LGTM with only one small point.

Manual verification is sufficient, but if you still have the test and want me to try and see if I can understand why the build is complaining, I'm happy to.

stef1927 · 2018-06-06T08:14:29Z

transport/src/main/java/io/netty/channel/AbstractChannel.java

@@ -801,6 +801,10 @@ private void deregister(final ChannelPromise promise, final boolean fireChannelI
                return;
            }

+            if (recvHandle != null) {


I would also check on fireChannelInactive because in theory it is possible to deregister a channel without closing it.

…nnel is closed.

…tor (netty#10890) Motivation: A race detector discovered a data race in GlobalEventExecutor present in netty 4.1.51.Final: ``` Write of size 4 at 0x0000cea08774 by thread T103: #0 io.netty.util.internal.DefaultPriorityQueue.poll()Lio/netty/util/internal/PriorityQueueNode; DefaultPriorityQueue.java:113 #1 io.netty.util.internal.DefaultPriorityQueue.poll()Ljava/lang/Object; DefaultPriorityQueue.java:31 #2 java.util.AbstractQueue.remove()Ljava/lang/Object; AbstractQueue.java:113 #3 io.netty.util.concurrent.AbstractScheduledEventExecutor.pollScheduledTask(J)Ljava/lang/Runnable; AbstractScheduledEventExecutor.java:133 #4 io.netty.util.concurrent.GlobalEventExecutor.fetchFromScheduledTaskQueue()V GlobalEventExecutor.java:119 #5 io.netty.util.concurrent.GlobalEventExecutor.takeTask()Ljava/lang/Runnable; GlobalEventExecutor.java:106 #6 io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run()V GlobalEventExecutor.java:240 #7 io.netty.util.internal.ThreadExecutorMap$2.run()V ThreadExecutorMap.java:74 #8 io.netty.util.concurrent.FastThreadLocalRunnable.run()V FastThreadLocalRunnable.java:30 #9 java.lang.Thread.run()V Thread.java:835 #10 (Generated Stub) <null> Previous read of size 4 at 0x0000cea08774 by thread T110: #0 io.netty.util.internal.DefaultPriorityQueue.size()I DefaultPriorityQueue.java:46 #1 io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run()V GlobalEventExecutor.java:263 #2 io.netty.util.internal.ThreadExecutorMap$2.run()V ThreadExecutorMap.java:74 #3 io.netty.util.concurrent.FastThreadLocalRunnable.run()V FastThreadLocalRunnable.java:30 #4 java.lang.Thread.run()V Thread.java:835 #5 (Generated Stub) <null> ``` The race is legit, but benign. To trigger it requires a TaskRunner to begin exiting and set 'started' to false, more work to be scheduled which starts a new TaskRunner, that work then needs to schedule additional work which modifies 'scheduledTaskQueue', and then the original TaskRunner checks 'scheduledTaskQueue'. But there is no danger to this race as it can only produce a false negative in the condition which causes the code to CAS 'started' which is thread-safe. Modifications: Delete problematic references to scheduledTaskQueue. The only way scheduledTaskQueue could be modified since the last check is if another TaskRunner is running, in which case the current TaskRunner doesn't care. Result: Data-race free code, and a bit less code to boot.

sbtourist requested a review from stef1927 June 1, 2018 18:48

stef1927 reviewed Jun 5, 2018

View reviewed changes

stef1927 approved these changes Jun 5, 2018

View reviewed changes

sbtourist force-pushed the DB-702 branch from cbb4827 to 967e146 Compare June 5, 2018 18:41

stef1927 reviewed Jun 6, 2018

View reviewed changes

sbtourist force-pushed the DB-702 branch from 967e146 to 8a98591 Compare June 6, 2018 10:34

sbtourist added 2 commits June 6, 2018 13:55

DB-702 Added RecvByteBufAllocator#Handle method to be called when cha…

6fae8e5

…nnel is closed.

DB-702 Bumped version to 4.1.25.3.dse.

d757e2b

sbtourist force-pushed the DB-702 branch from 8a274ff to d757e2b Compare June 6, 2018 13:18

sbtourist merged commit 14b7bd8 into dse-netty-4.1.25.Final Jun 6, 2018

sbtourist deleted the DB-702 branch June 6, 2018 13:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DB-702 Added release hook to RecvByteBufAllocator#Handle. #9

DB-702 Added release hook to RecvByteBufAllocator#Handle. #9

sbtourist commented Jun 1, 2018

stef1927 left a comment

sbtourist commented Jun 5, 2018

stef1927 commented Jun 5, 2018

sbtourist commented Jun 5, 2018

stef1927 commented Jun 5, 2018 •

edited

Loading

sbtourist commented Jun 5, 2018

stef1927 commented Jun 5, 2018

sbtourist commented Jun 5, 2018

stef1927 left a comment

stef1927 Jun 6, 2018

DB-702 Added release hook to RecvByteBufAllocator#Handle. #9

DB-702 Added release hook to RecvByteBufAllocator#Handle. #9

Conversation

sbtourist commented Jun 1, 2018

stef1927 left a comment

Choose a reason for hiding this comment

sbtourist commented Jun 5, 2018

stef1927 commented Jun 5, 2018

sbtourist commented Jun 5, 2018

stef1927 commented Jun 5, 2018 • edited Loading

sbtourist commented Jun 5, 2018

stef1927 commented Jun 5, 2018

sbtourist commented Jun 5, 2018

stef1927 left a comment

Choose a reason for hiding this comment

stef1927 Jun 6, 2018

Choose a reason for hiding this comment

stef1927 commented Jun 5, 2018 •

edited

Loading