Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpEventHandler.body should avoid being resized #5444

Open
franz1981 opened this issue Jan 13, 2025 · 13 comments
Open

HttpEventHandler.body should avoid being resized #5444

franz1981 opened this issue Jan 13, 2025 · 13 comments

Comments

@franz1981
Copy link
Contributor

franz1981 commented Jan 13, 2025

Handling chunks requires appending Buffer(s) in the body stored at HttpEventHandler.
See

This can both create excessive heap footprint and performing useless copies.
It would be ideal if we could retain the chunks accumulating them in some list or queue and eventually allocating a single buffer to copy or transferring them to a composite buffer (without copying anything).
I have used a similar strategy in https://github.com/quarkusio/quarkus/blob/b870c90569a4c88467313587565205818034f4df/independent-projects/vertx-utils/src/main/java/io/quarkus/vertx/utils/AppendBuffer.java#L19

@geoand
Copy link
Contributor

geoand commented Jan 13, 2025

This would be very useful in Quarkus when a large body is returned as part of the REST Client call

@franz1981
Copy link
Contributor Author

franz1981 commented Jan 13, 2025

In addition, without this we could probably suffer from GC nepotism (see jbossas/jboss-threads#74) since the body, while being resized, can fall into young gen, while the http event handler, since it collect all chunks, can become old gen, still referencing it, and making it to become old as well, even if not required.
Nulling out the body field once not needed, should help.

@gsmet
Copy link

gsmet commented Jan 14, 2025

To give an idea of the problem: to get a 365 MB file in chunks with the REST Client in Quarkus, we end up allocating 16 GB of byte[] through Buffer operations.

@vietj
Copy link
Member

vietj commented Jan 14, 2025

this method should only be used for small payloads, that does not mean that we cannot optimize the current one though, yet I am not advocating to use this for large payload, the javadoc says Don't use this if your request body is large - you could potentially run out of RAM.

@geoand
Copy link
Contributor

geoand commented Jan 14, 2025

this method should only be used for small payloads

This is of course true, but:

  • A large payload like the one @gsmet mentioned while clearly is a pathological case, it should however not result in an almost 2 orders of magnitude increase in the necessary memory size. If I am trying to grab a 300MB payload and I have say 600MB of heap, shouldn't I be able to get it?
  • In the REST Client (from where the issue was found by @gsmet) , it's often impossible to know how large the retrieved payload will be. Users should use one of the other ways we provide, but we can't know how large the payload is actually going to be - this even differ during the lifetime of the application.

@gsmet
Copy link

gsmet commented Jan 14, 2025

Especially when dealing with chunks as you have no idea of the size of the response.

@vietj
Copy link
Member

vietj commented Jan 14, 2025

I did say "that does not mean that we cannot optimize the current one though" so of course we can improve the impl and we will :-), I was just giving my opinion.

when dealing with unknown size, I believe it should be conservative.

@geoand
Copy link
Contributor

geoand commented Jan 14, 2025

🙏🏽

@gsmet
Copy link

gsmet commented Jan 14, 2025

Don't use this if your request body is large - you could potentially run out of RAM

To clarify, @vietj, I'm perfectly aware that loading 365 MB in RAM is a bad idea (and it was even worse as we were copying the buffer so it was more like 750 MB of RAM).
But my main concern here is the memory allocation storm we end up having.

I will do some tests with various sizes to give you an idea of how it goes with 1 MB/2 MB/10 MB/100 MB so that we have a baseline. I have a Byteman script that can handle this.

@gsmet
Copy link

gsmet commented Jan 14, 2025

Here is what I came up with.
The amplification factor is allocations / size of payload.

It's relatively stable until 20 MB and then gets a lot worse. Still we allocate 4 to 5 times the size of the payload. Note that this might not only be due to the Vert.x side.
After a certain size, it evolves linearly from what I can see and not for the better: for a 400 MB file, we allocate more than 20 GB of memory.

Screenshot from 2025-01-14 16-27-12

@franz1981
Copy link
Contributor Author

Yep the enlargement of the buffer happens based on what Netty decide - which is a linear for a bit than move to power of 2 increases

@EmadAlblueshi
Copy link
Contributor

@franz1981
Copy link
Contributor Author

Yep @EmadAlblueshi it is similar; appending can be dangerous unless it happens very few times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants