You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was recently migrating some code from requests to httpx, and came across some pitfalls related to streaming large streams. I need to be able to send BinaryIO objects raw over a connection somewhere, and critically I need to set a Content-Length on the messages I send.
In requests, this is possible to hack by (ab)using the super_len function, which tries to infer the length of the stream by guessing at len, fileno, etc. httpx does something similar, in its peek_filelike_length method, except it doesn't try to guess at len, only fileno.
Like requests, this falls back on tell and seek if this fails, which is where my issue happens. Not all streams are harmlessly seekable, so if you naively pass along a stream that isn't an actual file handle to the files argument, you risk loading the entire stream, which in my case could be hundreds of gigabytes of data.
In the end I ended up creating a Request manually and using the stream argument, which isn't exposed anywhere else from what I can tell. The ergonomics of this is also not great, since I then have to manually set all headers.
Personally I really dislike the whole "try to guess the file length" business, but that is what it is. There are a few things that could be done to greatly improve ergonomics here:
Do not set or try to guess Content-Length or Transfer-Encoding if either are specified by the user. This one seems like a no-brainer to me, especially the AsyncIterable case in encode_content. If the user knows the content length, there is no real reason to set Transfer-Encoding: chunked or to try to guess the length.
Create a protocol type or something else you can pass as RequestContent or somewhere else that encapsulates a "stream with length".
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I was recently migrating some code from requests to httpx, and came across some pitfalls related to streaming large streams. I need to be able to send
BinaryIO
objects raw over a connection somewhere, and critically I need to set aContent-Length
on the messages I send.In requests, this is possible to hack by (ab)using the
super_len
function, which tries to infer the length of the stream by guessing atlen
,fileno
, etc. httpx does something similar, in itspeek_filelike_length
method, except it doesn't try to guess atlen
, onlyfileno
.Like requests, this falls back on
tell
andseek
if this fails, which is where my issue happens. Not all streams are harmlessly seekable, so if you naively pass along a stream that isn't an actual file handle to thefiles
argument, you risk loading the entire stream, which in my case could be hundreds of gigabytes of data.In the end I ended up creating a
Request
manually and using thestream
argument, which isn't exposed anywhere else from what I can tell. The ergonomics of this is also not great, since I then have to manually set all headers.Personally I really dislike the whole "try to guess the file length" business, but that is what it is. There are a few things that could be done to greatly improve ergonomics here:
Content-Length
orTransfer-Encoding
if either are specified by the user. This one seems like a no-brainer to me, especially theAsyncIterable
case inencode_content
. If the user knows the content length, there is no real reason to setTransfer-Encoding: chunked
or to try to guess the length.RequestContent
or somewhere else that encapsulates a "stream with length".Beta Was this translation helpful? Give feedback.
All reactions