-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup aiohttp web server #2779
Comments
Also, the compression and the chunk features that are mixed between Allowing a clean happy path for some ad-hoc cases - that surely are the most used - as I'm aligned with you, but I would try first to speed up the |
I think it's OK to ignore any content-length and transfer-encoding settings for these responses, and always use the content-type specified by user if there is any. But I don't quite understand the speedup, I thought building the headers are always slower... |
the check for headers is quite expensive <- I don’t understand |
P.S. |
Also |
How much do they cost comparing to a simple json encode? |
How is json encoding related to the issue? |
If it costs far less than json encode, it may not worth it. |
Say again, why should I care about JSON encoding cost at all? |
Take it easy... When optimising, we begin from the slowest part. I have tested the performance of aiohttp before, It is about 4000qps/CPU core, that is 250us per request. We must reduce 50us or more to make a difference. And if I remember It correctly, encoding of a JSON response takes about 10us. So I was wondering if doing checks are really a bottleneck now. I don't have a testing environment with me until next week. I will look at your gist after that. |
@hubo1016 |
@kxpal understood. I am just using it as a benchmark baseline, since different hardware produces very different numbers. |
4000 RPS? |
8200 RPS on my laptop |
@thomaszdxsn python debugger is off, access log is off, uvloop is used? |
https://gist.github.com/asvetlov/13295efdc1301bea241d7d35496e6f81 I take this gist, and not use cProfile.
what mean? I know python have a debugger call |
Sorry, my mistake -- should be "profiler". |
yes, this is my command:
wrk command:
output:
another wrk:
output:
my computer:
maybe reason is MacOS |
@asvetlov ah, I didn't turn off the access log. I just use the default settings. It makes sense if you are targeting 20k+qps. Shall we rewrite the whole class with Cython first? That might be easier. |
from my experience, biggest win in term of performance is related to ability to run async code in place |
Good point |
interesting @fafhrd91! Will be nice to see if the naive code path that has a completed response without needing a loop iteration exists right now in the current implementation, plenty of awaits that might be breaking the execution. |
It will be nice to implement http/2 , that will give a really good performance boost on real world apps. As http/2 has a HPACK header compression, it is possible to use caching for header parsing, validation, and all other stuff. What will give blasting performance boost and top limit of uvloop tcp performance (100k rps) can be reached nearly, as http/2 is a binary protocol. And that's not all - almost all real world apps uses sessions, no matter what kind they are (kv, rdms, jwt) they all need some work for cpu, this work can bee cached to. And there is a ready implementation of http/2 with python binding , what would not be hard to integrate. |
Are you volunteering for the task? |
Maybe. If it will be requirement for my current project (we have some performance issues against heavy application level ddos). This task is interesting for me, but I cant give any promises about my part in it because there are always some factors that may break promises. |
Without profiler With profiler |
I'm guessing that's UrlMappingMatchInfo.add_app(), which would make sense, because you get a new match info for each request and you want the app available on there. It just sets it as the current_app and adds it to an empty list (when routing through sub apps, then I think you end up with a list representing the chain of apps). |
diff --git a/aiohttp/web_response.py b/aiohttp/web_response.py
index 73063890f..a7c37c121 100644
--- a/aiohttp/web_response.py
+++ b/aiohttp/web_response.py
@@ -93,7 +93,7 @@ class StreamResponse(BaseClass, HeadersMixin):
self._compression = False
self._compression_strategy: int = zlib.Z_DEFAULT_STRATEGY
self._compression_force: Optional[ContentCoding] = None
- self._cookies = SimpleCookie()
+ self._cookies: Optional[SimpleCookie] = None
self._req: Optional[BaseRequest] = None
self._payload_writer: Optional[AbstractStreamWriter] = None
@@ -209,6 +209,8 @@ class StreamResponse(BaseClass, HeadersMixin):
@property
def cookies(self) -> SimpleCookie:
+ if self._cookies is None:
+ self._cookies = SimpleCookie()
return self._cookies
def set_cookie(
@@ -230,6 +232,9 @@ class StreamResponse(BaseClass, HeadersMixin):
Sets new cookie or updates existent with new value.
Also updates only those params which are not None.
"""
+ if self._cookies is None:
+ self._cookies = SimpleCookie()
+ else:
old = self._cookies.get(name)
if old is not None and old.coded_value == "":
# deleted cookie
@@ -277,6 +282,7 @@ class StreamResponse(BaseClass, HeadersMixin):
Creates new empty expired cookie.
"""
# TODO: do we need domain/path here?
+ if self._cookies:
self._cookies.pop(name, None)
self.set_cookie(
name, #2779 (comment) should have skipped some of the pop operations if nothing set. Can get another ~2% out of it with the above. That will also have some small per request memory benefit to not create a |
related issue #2779 We create these objects even if there are no cookies
My only question is whether the proposal in the original issue (i.e. basically replacing direct web.Response() instantiation with helper functions) is something we want to look at implementing, or will it not make a difference? |
Maybe |
Profile above shows more time is spent in the tuple compares. #2779 (comment) #9896 is a test to see if that makes a difference to unpack them |
I don't see a way to speed any of those up The extra super() property call in aiohttp/aiohttp/web_response.py Line 214 in 5ab2d47
|
even if we clean that all up a bit more its likely still less than everything we would get from #2779 (comment) |
I can't come up with anything to make it faster that doesn't significantly hurt readability or make the code much less dry, and definitely nothing worth that kind of trade off. Could benefit from a second set of eyes looking at the codspeed flame graphs |
Resource.resolve shows up in the profile.. #9899 attempt to make it faster |
Another interesting optimization would be in using https://pypi.org/project/hyperscan/ for route path search. |
See #9907 for hyperscan POC |
Interesting. I would note that while you were away we did a big refactor in #7829. For an application with 5000 routes, this reduced route resolution from ~5s to ~100ms. So, we've already seen a substantial improvement on that front, though if there's more to be had, then great. |
I think if @asvetlov agrees that the original proposal is not worth doing, then we can probably close this issue. If there are any more specific suggestions for optimisations anyone has, then a new issue can be opened for each suggestion (or just a PR). |
I see a lot of improvements here, thank you guys. |
related issue #2779 We create these objects even if there are no cookies
Everything looks good now. |
The typical web handler looks like:
The text was updated successfully, but these errors were encountered: