Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decrypt_file_iter, a generator yielding chunks that *would* be passed to on_data? Concise alternative to using on_data for streaming #246

Open
vergenzt opened this issue Jan 14, 2025 · 4 comments
Labels
enhancement New feature or request

Comments

@vergenzt
Copy link

Is your feature request related to a problem? Please describe.
It'd be nice to be able to iterate over streamed chunks in gpg.decrypt_file instead of having to set an on_data callback.

Describe the solution you'd like
Use one of the functions from https://stackoverflow.com/questions/9968592/turn-functions-with-a-callback-into-python-generators to wrap decrypt_file, yield-ing each chunk of the file as it streams.

Then once the iterator terminates, perhaps a separate method could get the result from the GPG object? Not sure the best way to handle this. Or maybe just raise an exception if there's any failure and ignore the result object otherwise?

Describe alternatives you've considered
Just using on_data and doing this myself. 🙂

@vergenzt vergenzt changed the title Iterator yielding chunks on decrypt_file?1 decrypt_file_iter, a generator yielding chunks that *would* be passed to on_data? Concise alternative to using on_data for streaming Jan 14, 2025
@vsajip
Copy link
Owner

vsajip commented Jan 14, 2025

I'm sure it might be aesthetically pleasing from a "design purity" point of view, but does it give you anything that you can't do with on_data? I imagine that the users of on_data are a small subset of the users of this package - the functionality was only added relatively recently, because before that no-one asked for it! So it would add complexity to the code beyond what is there now, for an as yet unquantified (and perhaps unquantifiable) benefit.

@vergenzt
Copy link
Author

vergenzt commented Jan 27, 2025

Okay I've just figured out what it would give me that I can't do with on_data -- handle chunks of the file from the same thread I initiated the decryption from.

E.g. right now I'm trying to decrypt_file on a large file, process lines of it into a SQLAlchemy record, and then commit the result. I've now got on_data set up to handle the results... but now unfortunately because on_data gets called from a background thread, my database session which is thread-local by default is throwing sqlalchemy.exc.InvalidRequestError: Object '<MyObject at 0x123456789>' is already attached to session '3' (this is '5'). 🙁

Would you be open to merging it if I add this, to simplify this use case?

vergenzt added a commit to vergenzt/python-gnupg that referenced this issue Jan 27, 2025
@vsajip
Copy link
Owner

vsajip commented Jan 28, 2025

Would you be open to merging it if I add this, to simplify this use case?

It depends on how the proposed changes look. After all, I would have to provide on-going support indefinitely, for an uncommon use case. In terms of your use case, this could be addressed with the current setup by the on_data callable just sending the chunks to a queue, which the session-owning thread can read from as a consumer.

@vsajip
Copy link
Owner

vsajip commented Jan 29, 2025

I've updated the documentation to talk about threading constraints when processing data.

@vsajip vsajip added the enhancement New feature or request label Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants