You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Pony, is there a way to do a select query where I am able to consume the results from the database one-at-a-time, without having the query fetch every single row into a list of objects up-front?
My use case is that I frequently have to do a query where I want to have a certain maximum output size, but there is some complex filtering that can't be done purely within a database query (at least, not efficiently), and so I need to scan entries until my output result set is full. Unfortunately, doing something like for item in Items.select(...) ends up fetching the entire table into row objects before I can do the further filtering. This is what I was doing before (roughly):
query = Items.select(...)
results = []
max_count = 100
for item in query:
if len(results) >= max_count:
break
if filter_item(item):
results.append(item)
However, the for item in query triggers the retrieval of the entire query and generates Python entity objects for every possible database row, which is extremely inefficient in terms of both time and memory.
Currently what I'm doing is retrieving batches of items using code similar to:
query = Items.select(...)
results = []
max_count = 100
start = 0
while len(results) < max_count:
chunk_size = max(16, max_count - len(results)) # lots of tuning can happen here of course
chunk = query[start:start + chunk_size]
start += chunk_size
if not chunk:
break
for item in chunk:
if filter_item(item):
results.append(item)
if len(results) >= max_count:
break
This works and is a lot more efficient (both in terms of time and memory), but for my particular retrieval case it's still less efficient overall than executing the query once and then retrieving items until the result set is full, especially since LIMIT/OFFSET queries can result in "accidentally quadratic" behavior. It's much more code to maintain on my end, as well.
If I could instantiate entity objects from a DBAPI cursor then I could implement this myself, but I'm not seeing anything in the public Pony API that allows this. Alternately, it would be helpful to have a means of getting an iterator from the query that realizes entities as it goes (instead of doing a full fetch of the entire query up-front).
Thanks!
The text was updated successfully, but these errors were encountered:
In Pony, is there a way to do a select query where I am able to consume the results from the database one-at-a-time, without having the query fetch every single row into a list of objects up-front?
My use case is that I frequently have to do a query where I want to have a certain maximum output size, but there is some complex filtering that can't be done purely within a database query (at least, not efficiently), and so I need to scan entries until my output result set is full. Unfortunately, doing something like
for item in Items.select(...)
ends up fetching the entire table into row objects before I can do the further filtering. This is what I was doing before (roughly):However, the
for item in query
triggers the retrieval of the entire query and generates Python entity objects for every possible database row, which is extremely inefficient in terms of both time and memory.Currently what I'm doing is retrieving batches of items using code similar to:
This works and is a lot more efficient (both in terms of time and memory), but for my particular retrieval case it's still less efficient overall than executing the query once and then retrieving items until the result set is full, especially since
LIMIT
/OFFSET
queries can result in "accidentally quadratic" behavior. It's much more code to maintain on my end, as well.If I could instantiate entity objects from a DBAPI cursor then I could implement this myself, but I'm not seeing anything in the public Pony API that allows this. Alternately, it would be helpful to have a means of getting an iterator from the query that realizes entities as it goes (instead of doing a full fetch of the entire query up-front).
Thanks!
The text was updated successfully, but these errors were encountered: