Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When serving large payloads under heavy load, the "OutOfMemoryError: direct memory" error is raised #5443

Closed
jigga opened this issue Jan 13, 2025 · 21 comments
Assignees

Comments

@jigga
Copy link

jigga commented Jan 13, 2025

Version

Which version(s) did you encounter this bug?

4.5.11

Context

I encounter the below exception pretty quickly when serving large payloads via HTTP GET under heavy load.

java.lang.OutOfMemoryError: Cannot reserve 4194304 bytes of direct buffer memory (allocated: 6438264840, limit: 6442450944)
        at java.base/java.nio.Bits.reserveMemory(Bits.java:178)
        at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:111)
        at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:360)
        at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:718)
        at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:693)
        at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:213)
        at io.netty.buffer.PoolArena.tcacheAllocateSmall(PoolArena.java:178)
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:135)
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:127)
        at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:403)
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188)
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179)
        at io.vertx.core.buffer.impl.PartialPooledByteBufAllocator.directBuffer(PartialPooledByteBufAllocator.java:84)
        at io.netty.handler.ssl.SslHandler$SslEngineType$1.allocateWrapBuffer(SslHandler.java:233)
        at io.netty.handler.ssl.SslHandler.allocateOutNetBuf(SslHandler.java:2416)
        at io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:873)
        at io.netty.handler.ssl.SslHandler.wrapAndFlush(SslHandler.java:828)
        at io.netty.handler.ssl.SslHandler.handleUnwrapThrowable(SslHandler.java:1397)
        at io.netty.handler.ssl.SslHandler.decodeNonJdkCompatible(SslHandler.java:1379)
        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1417)
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530)
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1357)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:868)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:1583)

Do you have a reproducer?

I have a simple reproducer repo listed below. It's a Gradle project consisting of the client and server modules. There are dedicated Gradle tasks to start the server and client. Once both are started, it takes several seconds to reproduce the issue.

Steps to reproduce

  1. Checkout the reproducer project
  2. Run the server via ./gradlew :server:run
  3. Run the client via ./gradlew :client:run
  4. Wait few seconds and go to the server log where you would see the OutOfMemoryError(s).

Extra

This is roughly the whole server logic.

    final var payload = new byte[70 * 1024 * 1024];
    final var router = Router.router(vertx);
    final var apiRoute = router
      .route("/api")
      .handler(ctx -> {
        final var response = ctx.response();
        response.exceptionHandler(throwable -> {
          System.out.println("Request failed - " + throwable.getMessage());
        });
        response.putHeader(HttpHeaders.CONTENT_TYPE, HttpHeaderValues.APPLICATION_OCTET_STREAM);
        response.putHeader(HttpHeaders.CONTENT_LENGTH, String.valueOf(payload.length));
        response.send(Buffer.buffer(Unpooled.wrappedBuffer(payload)));
      });

The code uses the send(Buffer) method to send the response. My expectation was that this method would apply backpressure if necessary, but that's not the case.

I have another branch in the reproducer project that applies the backpressure using the Java's Semaphore, however, I'd like a Vert.x native solution.

@jigga jigga added the bug label Jan 13, 2025
@tsegismont
Copy link
Contributor

This stacktrace is from the client side, correct? As it seems your client verticle is implemented with the Vert.x Web Client, it is expected that the client buffers the whole server response.

Either use the pipe body codec or switch to the Vert.x core HttpClient for more flexibility.

https://vertx.io/docs/vertx-web-client/java/#_decoding_responses

@jigga
Copy link
Author

jigga commented Jan 13, 2025

Hey, thanks for the quick response. The stack trace is actually from the server component. The server essentially stalls at some point under the heavy load and just reports these exceptions and drops the clients connections.

@tsegismont
Copy link
Contributor

Then you got some information about how many requests the server can handle with this implementation before it fails with an OOME. Vert.x won't refuse connections looking at how much memory is available.

You have two options:

  • not changing the code and deploy as many instances as needed to handle this load
  • changing the code to send the content without loading the full response payload in memory

@tsegismont tsegismont added question and removed bug labels Jan 13, 2025
@jigga
Copy link
Author

jigga commented Jan 13, 2025

I guess my intent here is not for vertx to refuse connections based on memory usage, but rather to not allocate too much data on the direct memory when sending the data back to the clients. So something like "oh, I tried allocating the direct memory for the buffer that I'm trying to send back as a response to the client and got OutOfMemoryError - I think I need to signal that upstream so that it slows down the rate at which it produces the data".

Below is the backpressure implementation that I have in the semaphore branch of the reproducer project. There is a Java Semaphore there that contains the number of permits equivalent to 1 GiB expressed in bytes. Before sending the response back to the client (and in this case I use response.end(Buffer) as response.send(Buffer) did not work for some reason), I try acquiring as many permits as there are bytes in the payload and if I'm successful, I call response.end(Buffer) and release the permits in the AsyncResult handler. If I can't acquire the permits, I schedule one off timer to retry with exponential delay. It works, I don't get any OoME, but the mean response time obviously increases. The thing is that this solution doesn't feel (perhaps it's just my subjective feeling) a Vertx-native solution.

I also tried implementing a custom ReadStream that would wrap around the byte[] that I'm sending as the response body and actually deliver the byte[] in chunks, but as the chunk size grows, I'm starting observing the OoME errors again.

If no other hints on how I could tackle this problem then I'd stick to the Semaphore-based implementation, but honestly, I would count on at least some hints regarding the direction I could follow to solve the problem in a Vertx-native way.

Thanks

  private void sendResponse(final RoutingContext context, final byte[] payload) {
    final var response = context.response();

    if (context.get("id") == null) {
      context.put("id", UUID.randomUUID().toString());
    }
    final String id = context.get("id");

    if (context.get("attempt") == null) {
      context.put("attempt", 0);
    }
    final int attempt = context.get("attempt");
    context.put("attempt", attempt + 1);

    if (semaphore.tryAcquire(payload.length)) {
      System.out.println(id + " - permits acquired, available permits: " + semaphore.availablePermits());
      response.putHeader(HttpHeaders.CONTENT_TYPE, HttpHeaderValues.APPLICATION_OCTET_STREAM);
      response.putHeader(HttpHeaders.CONTENT_LENGTH, String.valueOf(payload.length));

      // Flag to track if permits have been released
      final var permitsReleased = new AtomicBoolean(false);
      response.end(Buffer.buffer(Unpooled.wrappedBuffer(payload)), endResult -> {
        if (permitsReleased.compareAndSet(false, true)) {
          semaphore.release(payload.length);
          final var message = "%s - response sent, permits released, attempt: %d, available permits: %d"
            .formatted(id, attempt, semaphore.availablePermits());
          System.out.println(message);
        }
      });
    } else {
      System.out.println(id + " - semaphore not acquired, available permits: " + semaphore.availablePermits());
      // exponential backoff
      final var baseDelay = 10;
      final var maxDelay = 1000;
      final var delay = Math.min(baseDelay * (1L << attempt), maxDelay);
      context.vertx().setTimer(delay, timerId -> sendResponse(context, payload));
    }
  }

@NavidMitchell
Copy link

We just started seeing this as well after upgrading to Vert.x 4.5 we have been running on 3.9 for a couple years and we just pushed our upgrade of 4.5 to prod and immediately started having this same problem.

@vietj vietj self-assigned this Jan 25, 2025
@vietj vietj added this to the 4.5.13 milestone Jan 25, 2025
@vietj
Copy link
Member

vietj commented Jan 25, 2025

@franz1981 does it ring a bell ?

@franz1981
Copy link
Contributor

franz1981 commented Jan 25, 2025

Have you checked if, on server side, you actually release the received buffers?
It is as easy as by running the server with leak detection enabled.
I see that you have about 6 GB of overall direct memory capacity and SSL is usually more heap intensive then off-heap (I will check on Netty the stack trace you provided to understand what's going on) so if it is not enough it usually means:

  • there is some leak that prevent memory to be reused
  • the server should be more aggressive in propagating back pressure; it means that the server side pipeline should just avoid auto reading if cannot keep up with the client requests.
  • the required direct memory configured is not enough for the expected load

Foe the former, just enable leak detection and see what happens, for the second one, I have no idea what the server does (I suppose it uses Vertx) and probably @vietj can help more, knowing how it is configured.

The latter can still be the reason since the full request is not sent chunked (it isn't but I could be wrong) - and even without SSL, if you send tons of big payload to the server, you risk to overload it.

@NavidMitchell
Copy link

Just FYI, we are running our app in Kubernetes using an image built with paketo. With this setup the memory available for off heap is limited to 10mb as outlined here. https://paketo.io/docs/reference/java-reference/ We decided to set the -XX:MaxDirectMemorySize to ensure the process had adequate memory for direct allocations. We are testing now to see if this solves our problem. I am curious though if anyone on the vertx team knows if anything has changed between 3.9 and 4.5 that would make netty allocate off heap vs on heap memory?

@NavidMitchell
Copy link

NavidMitchell commented Jan 25, 2025

Also we are not using vertx with SSL enabled. Here is the stack from our error.

java.lang.OutOfMemoryError: Cannot reserve 2097152 bytes of direct buffer memory (allocated: 8521715, limit: 10485760) at java.base/java.nio.Bits.reserveMemory(Unknown Source) ~[na:na] at java.base/java.nio.DirectByteBuffer.(Unknown Source) ~[na:na] at java.base/java.nio.ByteBuffer.allocateDirect(Unknown Source) ~[na:na] at io.netty.buffer.UnpooledDirectByteBuf.allocateDirect(UnpooledDirectByteBuf.java:104) ~[netty-buffer-4.1.116.Final.jar:4.1.116.Final] at io.netty.buffer.UnpooledDirectByteBuf.capacity(UnpooledDirectByteBuf.java:156) ~[netty-buffer-4.1.116.Final.jar:4.1.116.Final] at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:305) ~[netty-buffer-4.1.116.Final.jar:4.1.116.Final] at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:280) ~[netty-buffer-4.1.116.Final.jar:4.1.116.Final] at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1103) ~[netty-buffer-4.1.116.Final.jar:4.1.116.Final] at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:105) ~[netty-codec-4.1.116.Final.jar:4.1.116.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:288) ~[netty-codec-4.1.116.Final.jar:4.1.116.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.116.Final.jar:4.1.116.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.116.Final.jar:4.1.116.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.116.Final.jar:4.1.116.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1357) ~[netty-transport-4.1.116.Final.jar:4.1.116.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[netty-transport-4.1.116.Final.jar:4.1.116.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.116.Final.jar:4.1.116.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:868) ~[netty-transport-4.1.116.Final.jar:4.1.116.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[netty-transport-4.1.116.Final.jar:4.1.116.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) ~[netty-transport-4.1.116.Final.jar:4.1.116.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) ~[netty-transport-4.1.116.Final.jar:4.1.116.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) ~[netty-transport-4.1.116.Final.jar:4.1.116.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[netty-transport-4.1.116.Final.jar:4.1.116.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.116.Final.jar:4.1.116.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.116.Final.jar:4.1.116.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.116.Final.jar:4.1.116.Final] at java.base/java.lang.Thread.run(Unknown Source) ~[na:na]

@franz1981
Copy link
Contributor

franz1981 commented Jan 25, 2025

if anything has changed between 3.9 and 4.5 that would make netty allocate off heap vs on heap memory?

I have myself pushed #5316 and #5292 but they are disabled by default - and should do the opposite actually (the former).

Also we are not using vertx with SSL enabled. Here is the stack from our error.

I'm a bit confused here, since the stack in #5443 (comment) is showing some SSL related stack.
Which stack is valid?

If I trust this last stack you sent something weird is going on:

io.netty.buffer.UnpooledDirectByteBuf.allocateDirect(UnpooledDirectByteBuf.java:104

If it goes unpooled it is clearly where the problem is.
The capacity requested is pretty small i.e. 2097152 so I believe what you got is due to the memory limit on the server which are not changed (are still 10 MB!).

We decided to set the -XX:MaxDirectMemorySize to ensure the process had adequate memory for direct allocations

This doesn't seems to have worked as the OOM message report (basically going OOM if beyond 10 MB).
I see indeed that https://github.com/netty/netty/blob/d0406d77f77c9b5da6279afdb4c801244fef12d9/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L112 Netty disable pooled off-heap buffers if the max memory available is too small.
So, I would fix that.

@franz1981
Copy link
Contributor

Returning to the original stack trace I see @jigga it looks like it is using TCNATIVE and I'm not aware of changes in Netty nor Vertx which could affect this SSL engine.
If the prior comments at #5443 (comment) are sorted out, I'd like to know if the upgrade of version has affected both client and server etc etc.
e.g.

  • what happen with client 3.9 and server 4.x?
  • what happen with client 4.x and server 3.9?

In short I would like to understand if something we do in the server is not correctly behaving or is something different we do in the client which cause the sever to be overloaded (as it seems) regardless its version.

@NavidMitchell
Copy link

@franz1981 I am not the same person that originally created this issue. I just found it when researching since I am seeing the same problem.

I do not have SSL enabled but according to their examples they do.

As for the setting change, to increase direct memory. We have just rolled that out. The stack I provided is before we made the change to the -XX:MaxDirectMemorySize setting. I am sorry I was not more clear about this.

As for Netty disabling off-heap buffers maybe that calculation is not working correctly when running inside a container.

So far it does look like increasing the XX:MaxDirectMemorySize has solved our problem. But we probably won't know for sure until Monday.

@franz1981
Copy link
Contributor

franz1981 commented Jan 25, 2025

So far it does look like increasing the XX:MaxDirectMemorySize has solved our problem

That's the part I've mentioned: https://github.com/netty/netty/blob/d0406d77f77c9b5da6279afdb4c801244fef12d9/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L112

Netty has changed it's chunk size, "recentlish" (see netty/netty@f650303) which probably has turned the number of direct arenas to become 0 instead of 1 (or more?). And this is likely what has made it happens.
Despite your OOM and this issue is still related OOM, the causes are different; hope that helped!

@NavidMitchell
Copy link

@franz1981 Thank you for your help. It looks like we were using netty 4.1.113.Final with vertx 3.9.16 before our upgrade to vertx 4.5. It looks like the change you mentioned happened long before that. I am going to dig into the code some more to see if I can find anything else that may have changed.

@franz1981
Copy link
Contributor

@NavidMitchell you can enable debug logging and report the number of direct arenas Netty decides to allocate; that would unveil If this is what has happened. Even if you increase the max direct memory, you need to make sure you are not using unpooled direct memory since, with the standard glibc malloc, the application can die due, after long time running, due to memory fragmentation.

@NavidMitchell
Copy link

@franz1981 @vietj We can enable debug logging as you suggested. Additionally, should I consider downgrading my app to Vertx 3.9? We have not changed anything netty related and your last comment is concerning. This is a production application serving 10k+ users daily. We did not see this problem until upgrading our app to Vertx 4.5.

@franz1981
Copy link
Contributor

should I consider downgrading my app to Vertx 3.9?

I won't suggest such; running with more direct memory is the way to go, regardless, since most netty operations really need it (especially I/O).
IDK how 3.9 was working there tbh, but if we want to explore it, I suggest you to open a different issue from this one, since you are not using SSL and you are not sending big payloads, which instead is what it is the culprit here.
And we can help you to understand better what's going on 🙏

@NavidMitchell
Copy link

@franz1981 I was able to confirm this is not related to vertx 3.9 -> 4.5 upgrade. It looks like our data volumes have increased around the same time we rolled out vertx 4.5 last week. On both the old and new versions of our app we were seeing numDirectArenas 0. Once we changed our settings we now have numDirectArenas 32. Also I was able to recreate the error on vertx 3.9.16 by sending really large payloads to our app. Thank you for all of your help!

@vietj
Copy link
Member

vietj commented Jan 27, 2025

as far as I can see, vertx and netty are not capable to apply back pressure in this case when you send a large buffer like this, the original buffer should be fragmented into chunks (I think you figured it out) and sent separately, this way vertx/netty would use the number of direct buffers that is related to the amount of data the TCP connection can ingest.

I other words, from the vertx/netty perspective, writing a very large buffers is a single Netty pipeline write that makes Netty's outbound buffer (on which back-pressure is based) grow from 0 to max of your size.

I see you tried using a read stream (which I think is a decent way to go) but still it failed, did you try to use various smaller chunk sizes ?

Perhaps we could use a netty handler for such case that would submit slices of buffers (something like ChunkedWriteHandler but for large buffers).

@jigga
Copy link
Author

jigga commented Jan 27, 2025

Wow, that thread exploded unexpectedly, but I'm very glad about it, and thanks all for your comments.

My current solution is indeed to chunk the response, but there's a chunk size threshold that when reached, I immediately start seeing the issue again, and hence I consider this solution a bit fragile. Also, I'm not sure if the load on my API increases further, could that cause the issue to re-appear.

I'd like to have a solution where even if there's a sudden, unexpected spike in the traffic to the API, the server remains stable and responsive and most importantly does not stall.

I'm leaning towards the Semaphore-based solution as that way I have full control over the amount of data flowing through the server at any given time, but then again I feel that there must be a more Vertx-native approach, however, I'm not sure I'm fluent enough with the Vertx internals to correctly implement one. Will share the final solution when I have it.

@vietj
Copy link
Member

vietj commented Jan 27, 2025

did you look at vertx traffic shapping option support ?

@vietj vietj modified the milestones: 4.5.13, 4.5.14 Feb 10, 2025
@vietj vietj closed this as not planned Won't fix, can't repro, duplicate, stale Mar 4, 2025
@vietj vietj removed this from the 4.5.14 milestone Mar 4, 2025
@vietj vietj removed the question label Mar 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants