Can't serve (very) large files #1736

lamby · 2018-03-31T16:49:34Z

This was originally filed in Debian as: http://bugs.debian.org/894512. Could it be related to #1733?

I'm writing a web application that needs to server fairly large files, in the terabyte range. I am using python3-bottle for my code, and it works just fine. However, when I run my application with gunicorn3 it doesn't work.

I've distilled this into a small test case. The application code, saved as file foo.py:

import bottle

def blob(*args, **kwargs):
    return bottle.static_file('blob', '.')

app = bottle.Bottle()
app.route(path='/blob', callback=blob)

This is the script that starts it, saved as file start.sh:

#!/bin/sh

set -eu

truncate -s 1G blob
gunicorn3 --bind 0.0.0.0:12765 foo:app

To test, run sh +x start.sh, and then from another host run:

curl http://195.201.99.89:12765/blob > blob

This always fails for me: curl complains:

curl: (18) transfer closed with 611469105 bytes remaining to read

It seems to always work over localhost. It always works when the blob is sufficiently small, such as 1024 bytes, even between hosts.

If I don't use gunicorn, and use the Bottle built-in HTTP server, it always works. Like this:

import bottle

def blob(*args, **kwargs):
    return bottle.static_file('blob', '.')

app = bottle.Bottle()
app.route(path='/blob', callback=blob)

 if __name__ == '__main__':
    app.run(host='0.0.0.0', port=12765)

I ran that on one machine, and ran curl on a different host and it worked 10 times in a row.

The text was updated successfully, but these errors were encountered:

benoitc · 2018-04-01T08:10:17Z

what are gunicorn logs ? If it blocks more than 30s when using the synchronous worker it will be killed. To achieve what you want to do you will need to use an async worker for now

ghost · 2018-04-02T13:46:50Z

You're right, the gunicorn log contains

[2018-04-02 15:40:41 +0200] [1565] [CRITICAL] WORKER TIMEOUT (pid:1568)

Adding a sufficiently long timeout works around the problem, but is not actually satisfactory to me. The worker isn't idle - it's transferring data. Given that I need to transfer quite large files, no timeout value it really reasonable. If I choose a timeout suitable for my user to transfer a terabyte, they'll next want to transfer ten terabytes, or a petabyte.

A very long timeout also makes the timeout feature fairly useless. If a worker is stuck and isn't responding, having a timeout that's days or weeks isn't very helpful. It seems to me that a timeout that applies only to an idle connection would be more useful.

As is, gunicorn doesn't seem useful to my use case.

RonRothman · 2018-04-02T15:11:49Z

@larswirzenius Did you see @benoitc's point about using async workers?

As an aside, if you're concerned about transferring petabytes of data, maybe a traditional HTTP server is not the best design for this task. I don't know your context, but have you considered a method that's better suited to moving huge amounts of data quickly (e.g. S3)?

tilgovi · 2018-04-02T17:24:33Z

You can use the threaded worker (-k gthread), even with only a single request thread, should solve your problem. The heartbeat happens on a separate thread.

benoitc · 2018-04-03T08:02:24Z

@larswirzenius either way use a gevent or eventlet worker it will do the trick. Also make sure that your framework allows to use sendfile. Depending on your need you can also bypass the supervision like the websocket example does, it will requires however that you take care of correctly closing the worker at the end.

lamby · 2018-05-31T17:49:53Z

@larswirzenius Did you manage to resolve this? :)

ghost · 2018-05-31T18:24:52Z

On Thu, 2018-05-31 at 10:50 -0700, Chris Lamb wrote: @larswirzenius Did you manage to resolve this? :)

I gave up on gunicorn for that program.

benoitc · 2018-06-01T07:44:43Z

@larswirzenius but did you try the suggestion before "giving up?"

benoitc · 2018-07-06T09:06:06Z

closing issue. sounds like we'll never know if the solutions have been tried or not.

lamby · 2018-07-06T09:09:20Z

:(

benoitc · 2018-07-06T09:17:44Z

@lamby it seems i mixed the answers. reopening it to see if something can be done :)

benoitc · 2019-11-22T20:36:57Z

closing the issue since no activity happened since awhile. feel free to open a new ticket if needed.

johncronan · 2022-04-30T19:21:56Z

I had this same problem. Streaming HTTP response, the gunicorn timeout applied, even though it was transferring data. (With my network transfer throttled on the client side, it would not do it though - interesting.)

You can use the threaded worker (-k gthread), even with only a single request thread, should solve your problem.

It works great! I added worker_class = 'gthread' and threads = 1 to my gunicorn config file, and it's allowing the long downloads now.

roed314 · 2024-11-14T02:41:40Z

I have a similar problem, and unfortunately using threaded workers is not possible in our context: our application segfaults when running with threads due to this bug in cypari2, which I don't know how to fix.

From gunicorn's perspective I understand if threaded workers are the only reasonable answer, but if anyone has a workaround to suggest I'd be happy to hear it.

benoitc closed this as completed Jul 6, 2018

benoitc reopened this Jul 6, 2018

benoitc closed this as completed Nov 22, 2019

edgarcosta mentioned this issue Oct 31, 2024

Better handling of large search result downloads LMFDB/lmfdb#5293

Closed

roed314 mentioned this issue Oct 31, 2024

Make large downloads possible LMFDB/lmfdb#6221

Open

pajod mentioned this issue Dec 31, 2024

Slow request and not reading request body causes ConnectionResetError on large responses if keep-alive is disabled, even if worker timeout is disabled and in sync, async and threaded workers #3334

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't serve (very) large files #1736

Can't serve (very) large files #1736

lamby commented Mar 31, 2018

benoitc commented Apr 1, 2018 •

edited by tilgovi

Loading

ghost commented Apr 2, 2018

RonRothman commented Apr 2, 2018

tilgovi commented Apr 2, 2018

benoitc commented Apr 3, 2018

lamby commented May 31, 2018

ghost commented May 31, 2018 via email

benoitc commented Jun 1, 2018

benoitc commented Jul 6, 2018 •

edited

Loading

lamby commented Jul 6, 2018

benoitc commented Jul 6, 2018

benoitc commented Nov 22, 2019

johncronan commented Apr 30, 2022

roed314 commented Nov 14, 2024

Can't serve (very) large files #1736

Can't serve (very) large files #1736

Comments

lamby commented Mar 31, 2018

benoitc commented Apr 1, 2018 • edited by tilgovi Loading

ghost commented Apr 2, 2018

RonRothman commented Apr 2, 2018

tilgovi commented Apr 2, 2018

benoitc commented Apr 3, 2018

lamby commented May 31, 2018

ghost commented May 31, 2018 via email

benoitc commented Jun 1, 2018

benoitc commented Jul 6, 2018 • edited Loading

lamby commented Jul 6, 2018

benoitc commented Jul 6, 2018

benoitc commented Nov 22, 2019

johncronan commented Apr 30, 2022

roed314 commented Nov 14, 2024

benoitc commented Apr 1, 2018 •

edited by tilgovi

Loading

benoitc commented Jul 6, 2018 •

edited

Loading