Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question/feature request: ability to consume one result at a time from a cursor #727

Open
fluffy-critter opened this issue Oct 3, 2024 · 0 comments

Comments

@fluffy-critter
Copy link

In Pony, is there a way to do a select query where I am able to consume the results from the database one-at-a-time, without having the query fetch every single row into a list of objects up-front?

My use case is that I frequently have to do a query where I want to have a certain maximum output size, but there is some complex filtering that can't be done purely within a database query (at least, not efficiently), and so I need to scan entries until my output result set is full. Unfortunately, doing something like for item in Items.select(...) ends up fetching the entire table into row objects before I can do the further filtering. This is what I was doing before (roughly):

query = Items.select(...)
results = []
max_count = 100

for item in query:
    if len(results) >= max_count:
        break
    if filter_item(item): 
        results.append(item)

However, the for item in query triggers the retrieval of the entire query and generates Python entity objects for every possible database row, which is extremely inefficient in terms of both time and memory.

Currently what I'm doing is retrieving batches of items using code similar to:

query = Items.select(...)
results = []
max_count = 100

start = 0

while len(results) < max_count:
    chunk_size = max(16, max_count - len(results)) # lots of tuning can happen here of course
    chunk = query[start:start + chunk_size]
    start += chunk_size

    if not chunk:
        break

    for item in chunk:
        if filter_item(item):
            results.append(item)
        if len(results) >= max_count:
            break

This works and is a lot more efficient (both in terms of time and memory), but for my particular retrieval case it's still less efficient overall than executing the query once and then retrieving items until the result set is full, especially since LIMIT/OFFSET queries can result in "accidentally quadratic" behavior. It's much more code to maintain on my end, as well.

If I could instantiate entity objects from a DBAPI cursor then I could implement this myself, but I'm not seeing anything in the public Pony API that allows this. Alternately, it would be helpful to have a means of getting an iterator from the query that realizes entities as it goes (instead of doing a full fetch of the entire query up-front).

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant