You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you very much for this project.
I've been using it for years to store uploaded files on S3.
Please apologize for the long text. I'm not used to this format, but I wanted to include all the relevant information so you don't have to do this research again. I also wanted to include some workarounds until a definite solution can be implemented in the library.
Description of the problem
In the past, I used querystring_auth to let users download private files from S3, but nowadays, I prefer using a Django view as a proxy.
As you can see, it's a straightforward view that verifies user permission and sends the file's content. Of course, it slightly increases the server load compared to querystring_auth, but it allows me to offer nice, non-expiring URLs.
I've been using this solution for almost three years and had no problem until I tried to enable the gzip compression (using AWS_IS_GZIPPED).
Indeed, when compression is enabled, the HTTP response body contains the compressed file, but the Content-Length is set to the uncompressed file size, and the Content-Encoding header is not set.
This problem only occurs when using a WSGI server and not with Django's runserver command.
How is this possible?
The documentation of FileResponse mentions wsgi.file_wrapper, an optimization that allows the WSGI server to send the file directly. Let's see how this feature ends up with the compressed file sent in the response.
Now let's see what happens in the WSGI server; I'll take gunicorn as an example, but it's safe to assume that all WSGI implementations will have the same problem.
Ideally, we would need to remove the fileno property from S3File when the file is compressed. I don't think one can dynamically remove a property from a class, but we could remove fileno from S3File and add it only when the file is not compressed. However, doing so would require breaking the inheritance from File, and I don't know the implications.
Cons: The WSGI server can no longer use sendfile, even for uncompressed files.
You could also use FileResponse by default and switch to StreamingHttpResponse for compressed files, but it's a bit more complicated and strongly couples your code with the storage backend.
Workaround 3: Disable sendfile on the WSGI server
Pass --no-sendfile to gunicorn, or --disable-sendfile to uwsgi.
Cons: The WSGI server can no longer use sendfile.
Conclusion
As a conclusion to this extremely long issue (or should I say blog post), let me tell you how I fixed my existing apps:
For the applications where the gzip compression is essential (CSV files, for example), I used workaround 2: StreamingHttpResponse.
For the applications where gzip compression is not essential (images, for example), I use the workaround 1: disabling the compression.
That's all I had to say about that.
I hope it will benefit other users of django-storages, and who knows, maybe inspire a proper fix in the library.
Best regards,
Benoit
The text was updated successfully, but these errors were encountered:
Hi,
Thank you very much for this project.
I've been using it for years to store uploaded files on S3.
Please apologize for the long text. I'm not used to this format, but I wanted to include all the relevant information so you don't have to do this research again. I also wanted to include some workarounds until a definite solution can be implemented in the library.
Description of the problem
In the past, I used
querystring_auth
to let users download private files from S3, but nowadays, I prefer using a Django view as a proxy.As you can see, it's a straightforward view that verifies user permission and sends the file's content. Of course, it slightly increases the server load compared to
querystring_auth
, but it allows me to offer nice, non-expiring URLs.I've been using this solution for almost three years and had no problem until I tried to enable the gzip compression (using
AWS_IS_GZIPPED
).Indeed, when compression is enabled, the HTTP response body contains the compressed file, but the
Content-Length
is set to the uncompressed file size, and theContent-Encoding
header is not set.This problem only occurs when using a WSGI server and not with Django's
runserver
command.How is this possible?
The documentation of
FileResponse
mentionswsgi.file_wrapper
, an optimization that allows the WSGI server to send the file directly. Let's see how this feature ends up with the compressed file sent in the response.It all begins when
FileResponse
sets an attributefile_to_stream
pointing to theS3File
. Django's WSGI layer detects this attribute and wraps theS3File
with the class "wsgi.file_wrapper" provided by the WSGI server.Now let's see what happens in the WSGI server; I'll take
gunicorn
as an example, but it's safe to assume that all WSGI implementations will have the same problem.gunicorn
detects that the response contains awsgi.file_wrapper
and checks that the provided file-like object supportsfileno
.gunicorn
then happily sends the file usingsendfile()
. As we'll see in a moment, thisfileno
corresponds to the temporary file downloaded from S3. This file is still compressed, which explains why the server returns a compressed file.So how could
gunicorn
get thefileno
of the compressed file?As we saw, it received the
S3File
wrapped withwsgi.file_wrapper
.S3File
derives fromFile
.File
derives fromFileProxyMixin
.FileProxyMixin
forwards all calls to the underlying file object, includingfileno
.But what is the underlying file object?
If the S3 file is not compressed, the underlying object is a
SpooledTemporaryFile
, and everything works fine.But if the file is compressed, the underlying object is a
GzipFile
wrapping theSpooledTemporaryFile
.When asked for a
fileno
, theGzipFile
returns the one of the underlying object.In other words, it returns the
fileno
of the uncompressed file, which explains howgunicorn
got it.How to fix this issue?
Ideally, we would need to remove the
fileno
property fromS3File
when the file is compressed. I don't think one can dynamically remove a property from a class, but we could removefileno
fromS3File
and add it only when the file is not compressed. However, doing so would require breaking the inheritance fromFile
, and I don't know the implications.Looking at the code of
gunicorn
, you could be tempted to throw an exception fromfileno
. Unfortunately, this behavior isn't mentioned in the specification and isn't supported by other WSGI implementations.How to work around this issue?
Until there is a proper fix, here are a few ways to work around this problem.
Workaround 1: Disable GZIP compression
Either set the
gzip
storage option orAWS_IS_GZIPPED
toFalse
.Cons: Files aren't compressed anymore
Workaround 2: Use
StreamingHttpResponse
instead ofFileResponse
.Cons: The WSGI server can no longer use
sendfile
, even for uncompressed files.You could also use
FileResponse
by default and switch toStreamingHttpResponse
for compressed files, but it's a bit more complicated and strongly couples your code with the storage backend.Workaround 3: Disable
sendfile
on the WSGI serverPass
--no-sendfile
togunicorn
, or--disable-sendfile
touwsgi
.Cons: The WSGI server can no longer use
sendfile
.Conclusion
As a conclusion to this extremely long issue (or should I say blog post), let me tell you how I fixed my existing apps:
StreamingHttpResponse
.That's all I had to say about that.
I hope it will benefit other users of django-storages, and who knows, maybe inspire a proper fix in the library.
Best regards,
Benoit
The text was updated successfully, but these errors were encountered: