Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileResponse + S3File + gzip + WSGI = 💥 #1348

Open
bblanchon opened this issue Jan 11, 2024 · 0 comments
Open

FileResponse + S3File + gzip + WSGI = 💥 #1348

bblanchon opened this issue Jan 11, 2024 · 0 comments

Comments

@bblanchon
Copy link

Hi,

Thank you very much for this project.
I've been using it for years to store uploaded files on S3.

Please apologize for the long text. I'm not used to this format, but I wanted to include all the relevant information so you don't have to do this research again. I also wanted to include some workarounds until a definite solution can be implemented in the library.

Description of the problem

In the past, I used querystring_auth to let users download private files from S3, but nowadays, I prefer using a Django view as a proxy.

@login_required
def user_biography(request):
    return FileResponse(request.user.biography.open())

As you can see, it's a straightforward view that verifies user permission and sends the file's content. Of course, it slightly increases the server load compared to querystring_auth, but it allows me to offer nice, non-expiring URLs.

I've been using this solution for almost three years and had no problem until I tried to enable the gzip compression (using AWS_IS_GZIPPED).

Indeed, when compression is enabled, the HTTP response body contains the compressed file, but the Content-Length is set to the uncompressed file size, and the Content-Encoding header is not set.

This problem only occurs when using a WSGI server and not with Django's runserver command.

How is this possible?

The documentation of FileResponse mentions wsgi.file_wrapper, an optimization that allows the WSGI server to send the file directly. Let's see how this feature ends up with the compressed file sent in the response.

It all begins when FileResponse sets an attribute file_to_stream pointing to the S3File. Django's WSGI layer detects this attribute and wraps the S3File with the class "wsgi.file_wrapper" provided by the WSGI server.

Now let's see what happens in the WSGI server; I'll take gunicorn as an example, but it's safe to assume that all WSGI implementations will have the same problem.

gunicorn detects that the response contains a wsgi.file_wrapper and checks that the provided file-like object supports fileno. gunicorn then happily sends the file using sendfile(). As we'll see in a moment, this fileno corresponds to the temporary file downloaded from S3. This file is still compressed, which explains why the server returns a compressed file.

So how could gunicorn get the fileno of the compressed file?
As we saw, it received the S3File wrapped with wsgi.file_wrapper.
S3File derives from File.
File derives from FileProxyMixin.
FileProxyMixin forwards all calls to the underlying file object, including fileno.

But what is the underlying file object?
If the S3 file is not compressed, the underlying object is a SpooledTemporaryFile, and everything works fine.
But if the file is compressed, the underlying object is a GzipFile wrapping the SpooledTemporaryFile.
When asked for a fileno, the GzipFile returns the one of the underlying object.
In other words, it returns the fileno of the uncompressed file, which explains how gunicorn got it.

How to fix this issue?

Ideally, we would need to remove the fileno property from S3File when the file is compressed. I don't think one can dynamically remove a property from a class, but we could remove fileno from S3File and add it only when the file is not compressed. However, doing so would require breaking the inheritance from File, and I don't know the implications.

Looking at the code of gunicorn, you could be tempted to throw an exception from fileno. Unfortunately, this behavior isn't mentioned in the specification and isn't supported by other WSGI implementations.

How to work around this issue?

Until there is a proper fix, here are a few ways to work around this problem.

Workaround 1: Disable GZIP compression

Either set the gzip storage option or AWS_IS_GZIPPED to False.

Cons: Files aren't compressed anymore

Workaround 2: Use StreamingHttpResponse instead of FileResponse.

@login_required
def user_biography(request):
    return StreamingHttpResponse(request.user.biography.open())

Cons: The WSGI server can no longer use sendfile, even for uncompressed files.

You could also use FileResponse by default and switch to StreamingHttpResponse for compressed files, but it's a bit more complicated and strongly couples your code with the storage backend.

Workaround 3: Disable sendfile on the WSGI server

Pass --no-sendfile to gunicorn, or --disable-sendfile to uwsgi.

Cons: The WSGI server can no longer use sendfile.

Conclusion

As a conclusion to this extremely long issue (or should I say blog post), let me tell you how I fixed my existing apps:

  • For the applications where the gzip compression is essential (CSV files, for example), I used workaround 2: StreamingHttpResponse.
  • For the applications where gzip compression is not essential (images, for example), I use the workaround 1: disabling the compression.

That's all I had to say about that.
I hope it will benefit other users of django-storages, and who knows, maybe inspire a proper fix in the library.

Best regards,
Benoit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant