Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streaming response bodies for many=True schemas #180

Open
twosigmajab opened this issue Mar 4, 2020 · 4 comments
Open

streaming response bodies for many=True schemas #180

twosigmajab opened this issue Mar 4, 2020 · 4 comments
Labels
enhancement New feature or request validation Concerns request and response validation

Comments

@twosigmajab
Copy link
Contributor

It looks like request handlers that have response_body_schema=Foo(many=True) needlessly build up the entire list of Foos in memory before even starting to send the first Foo back to the client.

Just quickly searched through the Marshmallow tracker and didn't see anything about this, but then I searched through the code and I think marshmallow.Schema.dump() is the culprit:

if many and is_iterable_but_not_string(obj):
    obj = list(obj)

Of course, the request handler can require paging and enforce max page sizes etc., but it'd still be better if response bodies for JSON arrays could be streamed given how common many=True response bodies are. The stdlib's json module is perfectly capable of streaming, and there's no benefit to building these lists up in memory before starting to send the response. Am I missing something, or is this just a gap in Marshmallow? If so, it seems pretty easy for Rebar to work around this as long as the gap exists. Thanks!

@airstandley airstandley added the enhancement New feature or request label Mar 6, 2020
@airstandley
Copy link
Contributor

This would be an amazing enhancement, but it's a doozy.

  1. Looking over Marshmallow it seems like there's a lot of built in assumptions that you are dumping everything at once. I'm not sure how you'd have dump return a generator without breaking existing pre/post dump processors that do things like add envelops. 😞

2)We're using Flask's jsonify to dump the marshalled objects. It does not appear to support generators, so you'd have to implement a version that does handle them, which seems like it's got a lot of gotchas. (I think this might not actually be as easy as passing the Response.stream property into json.dump, but having never tried that 🤷‍♂ )

@jab
Copy link

jab commented Mar 7, 2020

Can Rebar work around (1) by detecting a many=True Schema, converting it to a many=False Schema, and using that to dump() an item at a time, yielding each dumped result from within a generator?

Agreed that (2) is annoying but doesn't seem like a show-stopper.

@jab
Copy link

jab commented Mar 7, 2020

Also re (2), I think the implementation in that post could be simplified a lot. Something along the lines of

from json import dumps

def dump_streaming(iterable):
    yield "["
    it = iter(iterable)
    i = next(it, None)
    while i is not None:
        yield dumps(i)
        i = next(it, None)
        if i is not None:
            yield ","    
    yield "]"


>>> ''.join(dump_streaming(range(0)))
'[]'
>>> ''.join(dump_streaming(range(1)))
'[0]'
>>> ''.join(dump_streaming(range(2)))
'[0,1]'
>>> ''.join(dump_streaming(range(3)))
'[0,1,2]'

@airstandley airstandley added the validation Concerns request and response validation label Jul 1, 2020
@jab
Copy link

jab commented Nov 24, 2020

Opened an issue in Marshmallow for this FWIW: marshmallow-code/marshmallow#1696

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request validation Concerns request and response validation
Projects
None yet
Development

No branches or pull requests

3 participants