Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB-702 Added release hook to RecvByteBufAllocator#Handle. #9

Merged
merged 2 commits into from
Jun 6, 2018

Conversation

sbtourist
Copy link

No description provided.

@sbtourist sbtourist requested a review from stef1927 June 1, 2018 18:48
Copy link

@stef1927 stef1927 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline @sbtourist, we can merge this but it will require more work in order to remove the Netty fork later on.

We will probably have to also move InboundByteBufAllocator and call release from the various unsafe implementations when the channel is closed. This is assuming they will accept this change as they may argue that the same can be achieved by their pooled allocator and if there is any performance improvement to be made it should be done in the allocator. Not something we want to tackle I think. The change also relies on the downstream consumers not keeping references to this large buffer, something that is true in our case but maybe not generally true.

These are the alternatives I could think of:

  • Wire a new channel aware allocator in our InboundInitializer. This has the advantage of being entirely Apollo side but, as discussed, it is risky since we cannot guarantee that the unsafe hasn't already created the handle yet. It shouldn't at the moment but this might change in future.
  • Pass the channel to newHandle(). There are currently extra invocations with no access to the channel but I think these only exist just to check that the handle is an extended handle, it may be that these will disappear in Netty 5 and we could just pass null for the time being. The real invocations in the unsafe implementations should have access to the channel when they create the handle. We could offer to clear up the extended handle code if it is no longer required in Netty 5 and add the channel to newHandle(). This would be a reasonably clean change and unrelated to performance, so easier to contribute.
  • Expose the channel in ChannelConfig. This is a trivial change in Netty, and therefore more likely to be accepted. It simply requires adding a channel getter to DefaultChannelConfig and ChannelConfig since all config concrete classes extend DefaultChannelConfig and this class already owns a channel. We can then register a listener with the channel close future the first time that reset is called. Still a slight implementation dependency but a relatively safe dependency, since the reset is an integral part of the handle logic.

@sbtourist
Copy link
Author

Thanks for your analysis @stef1927.

I spent some time looking at the alternatives you mentioned and I don't think those make things better, specifically:

  1. Wire a new channel aware allocator in our InboundInitializer: as discussed, this is race-prone, so not worth discussing more.
  2. Pass the channel to newHandle(): this is not too bad, but as you mentioned, it requires passing null from those places that do not have access to a Channel, which is confusing, and the channel argument is never used anyway in all current implementations; so, if we're worrying to introduce an interface change that is not actually used, this is also the case.
  3. Expose the channel in ChannelConfig: I don't like this at all to be honest; the config should be used to create the channel, hence it can exists without the latter, which means to me coupling the two would be bad design.

OTOH, the proposed patch is just about adding a new interface method, and I think there are good chances it will be accepted even without contributing InboundByteBufAllocator, in particular as Netty 5 will target Java8 and we will be able to add a default implementation, reducing the code impact to the minimum; we might need to call such new method when closing the channel in Unsafe, but that is a pretty small change.

Thoughts?

@stef1927
Copy link

stef1927 commented Jun 5, 2018

I am probably negatively biased because of my experience in OS Cassandra, but why would an open source project add a method to an interface just to benefit a single company if that method doesn't make any sense otherwise? Especially a method called release, which implies that every time a new handle is created, it should be released. At least we should take care of doing this.

Our current implementation of releasing the handle from InboundMessageHandler is in fact a bit brittle. What about during the handshake, isn't the buffer leaked if the handshake fails? We must remember to release it from other handlers too, but only from at most one handler in the pipeline. Also, what if there were an implementation of the unsafe that doesn't cache the handle? This would leak buffers. What if in Apollo we moved the allocation of the buffer to the constructor? Then all the extra invocations of newHandle() would leak the buffer.

As I said offline, it is OK to merge for the time being but later we'll have to do more work on this. The hope is that by the time the AIO reads have been merged into Netty 5, we'll have a better idea of what Netty 5 will look like, and hence it should be easier to work out the best approach. If we do keep this approach, then every time a handle is created by Netty, it should be released.

@sbtourist
Copy link
Author

why would an open source project add a method to an interface just to benefit a single company if that method doesn't make any sense otherwise? Especially a method called release, which implies that every time a new handle is created, it should be released. At least we should take care of doing this.

Netty is different then Cassandra: the former is a library, and as such it's common (in my experience) to provide extension points. Regarding the method name, that's a good point: what about renaming as onClose() and calling it in the Unsafe implementation when the close listener is notified (and adding a test about such method being actually called and the handle instance being the same)?

Our current implementation of releasing the handle from InboundMessageHandler is in fact a bit brittle.

The following list of concerns is definitely valid, but they are implementation concerns unrelated to the interface, i.e.:

What about during the handshake, isn't the buffer leaked if the handshake fails? We must remember to release it from other handlers too, but only from at most one handler in the pipeline.

Correct, and totally my fault, but this can be fixed easily either as proposed above by the Netty side or by adding a channel close listener on our side.

what if there were an implementation of the unsafe that doesn't cache the handle? What if in Apollo we moved the allocation of the buffer to the constructor?

I do not think there is much we can do about such changes: the best thing is probably writing a test to verify assumptions are not violated.

As I said offline, it is OK to merge for the time being but later we'll have to do more work on this.

I do not want to rush this if we don't agree on a decent solution: you're voicing valid concerns, would the change proposed above help against them?

@stef1927
Copy link

stef1927 commented Jun 5, 2018

It would definitely help to release the handle in the unsafe. I think you're right, it should be called where it calls fireChannelInactiveAndDeregister() in the abstract unsafe or thereabout.

Regarding the name, either onChannelClose() or onChannelInactive() or onClose(channel) or onInactive(channel), something that implies the channel being closed, otherwise just onClose is a bit unclear and I'd rather keep release or close to indicate close the handle itself. The unsafe should then set the handle to null.

What you mentioned on libraries accommodating extension points gave me another idea. The reason why we cannot just do an instanceof and use a channel close listener is because of EpollRecvByteAllocatorHandle and other delegating implementations. So perhaps the correct extension method to add is a method that gives access to the real implementation of the handle. So a method that by default returns this but returns the delegate in the wrapper handles. This is perhaps easier and a more generic approach that would help with other scenarios too, not just those related to releasing resources.

@sbtourist
Copy link
Author

the correct extension method to add is a method that gives access to the real implementation of the handle

The problem with that is EpollRecvByteAllocatorHandle doesn't extend DelegatingHandle, so that would require a more invasive change and cleaning up hierarchies just for the sake of making instanceof work, which is probably harder to justify (as you're not supposed to rely on instanceof, but you might call me a purist).

I'd rather stick with the previously proposed approach if that sounds ok to you.

@stef1927
Copy link

stef1927 commented Jun 5, 2018

OK

@sbtourist
Copy link
Author

@stef1927, I've amended the PR as discussed. I wanted to add a test, but for some reason it fails to compile due to deprecation warnings ... it almost drove me crazy, so I gave up and manually verified it (we can talk about it more offline).

Copy link

@stef1927 stef1927 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. They LGTM with only one small point.

Manual verification is sufficient, but if you still have the test and want me to try and see if I can understand why the build is complaining, I'm happy to.

@@ -801,6 +801,10 @@ private void deregister(final ChannelPromise promise, final boolean fireChannelI
return;
}

if (recvHandle != null) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also check on fireChannelInactive because in theory it is possible to deregister a channel without closing it.

@sbtourist sbtourist merged commit 14b7bd8 into dse-netty-4.1.25.Final Jun 6, 2018
@sbtourist sbtourist deleted the DB-702 branch June 6, 2018 13:21
thobe pushed a commit that referenced this pull request Jun 29, 2022
…tor (netty#10890)

Motivation:

A race detector discovered a data race in GlobalEventExecutor present in
netty 4.1.51.Final:

```
  Write of size 4 at 0x0000cea08774 by thread T103:
    #0 io.netty.util.internal.DefaultPriorityQueue.poll()Lio/netty/util/internal/PriorityQueueNode; DefaultPriorityQueue.java:113
    #1 io.netty.util.internal.DefaultPriorityQueue.poll()Ljava/lang/Object; DefaultPriorityQueue.java:31
    #2 java.util.AbstractQueue.remove()Ljava/lang/Object; AbstractQueue.java:113
    #3 io.netty.util.concurrent.AbstractScheduledEventExecutor.pollScheduledTask(J)Ljava/lang/Runnable; AbstractScheduledEventExecutor.java:133
    #4 io.netty.util.concurrent.GlobalEventExecutor.fetchFromScheduledTaskQueue()V GlobalEventExecutor.java:119
    #5 io.netty.util.concurrent.GlobalEventExecutor.takeTask()Ljava/lang/Runnable; GlobalEventExecutor.java:106
    #6 io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run()V GlobalEventExecutor.java:240
    #7 io.netty.util.internal.ThreadExecutorMap$2.run()V ThreadExecutorMap.java:74
    #8 io.netty.util.concurrent.FastThreadLocalRunnable.run()V FastThreadLocalRunnable.java:30
    #9 java.lang.Thread.run()V Thread.java:835
    #10 (Generated Stub) <null>

  Previous read of size 4 at 0x0000cea08774 by thread T110:
    #0 io.netty.util.internal.DefaultPriorityQueue.size()I DefaultPriorityQueue.java:46
    #1 io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run()V GlobalEventExecutor.java:263
    #2 io.netty.util.internal.ThreadExecutorMap$2.run()V ThreadExecutorMap.java:74
    #3 io.netty.util.concurrent.FastThreadLocalRunnable.run()V FastThreadLocalRunnable.java:30
    #4 java.lang.Thread.run()V Thread.java:835
    #5 (Generated Stub) <null>
```

The race is legit, but benign. To trigger it requires a TaskRunner to
begin exiting and set 'started' to false, more work to be scheduled
which starts a new TaskRunner, that work then needs to schedule
additional work which modifies 'scheduledTaskQueue', and then the
original TaskRunner checks 'scheduledTaskQueue'. But there is no danger
to this race as it can only produce a false negative in the condition
which causes the code to CAS 'started' which is thread-safe.

Modifications:

Delete problematic references to scheduledTaskQueue. The only way
scheduledTaskQueue could be modified since the last check is if another
TaskRunner is running, in which case the current TaskRunner doesn't
care.

Result:

Data-race free code, and a bit less code to boot.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants