Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support file-like protocol #9

Open
shantanuo opened this issue Mar 17, 2017 · 11 comments
Open

support file-like protocol #9

shantanuo opened this issue Mar 17, 2017 · 11 comments

Comments

@shantanuo
Copy link

It would be great to pickle object directly to S3. Something like this would be helpful.

import bucketstore
bucket = bucketstore.get('bucketstore-playground', create=True)

import pickle
a = {'hello': 'world'}

with open( bucket['foo11'], 'wb') as handle:
pickle.dump(a, handle, protocol=pickle.HIGHEST_PROTOCOL)

@kennethreitz
Copy link
Collaborator

i'm going to rename this issue

@kennethreitz kennethreitz changed the title bucket store with pickle support file-like protocol Apr 19, 2017
@eligundry
Copy link
Contributor

@kennethreitz I am currently working on this. I think I am either:

  1. Make S3Key inherit from something like io.IOBase. Not entirely sure how that will work, but I'll play around with it.
  2. Make some limited __enter__/__exit__ methods that will create an in memory stream that will be written to S3 on exit.

I think option 2 will be the best bet, but an in memory stream might not be the best idea for huge files.

Do you have any thoughts?

@kennethreitz
Copy link
Collaborator

kennethreitz commented Jun 7, 2017

@eligundry I'm not familiar with IOBase, but if it's a good fit, let's go for it!

What does boto normally do for large files?

@eligundry
Copy link
Contributor

@kennethreitz Boto is really flexible with what it will take. In all the Bucketstore examples, we clearly see that strings are just handled. But Boto will work fine with any file-like object that you throw at it.

I have been working on this a bit and have deviated slightly from the proposed syntax in the OP issue. What this is going to look like is:

key = bucket.key('foo')
data = {'hello': 'world'}

with key as fh:
    pickle.dump(data, fp, protocol=pickle.HIGHEST_PROTOCOL)

In this example, the key is automatically uploaded when it exits the with block. I have this somewhat working, I'm just trying to quash some Python 3 related Unicode issues.

@kennethreitz
Copy link
Collaborator

i like it!

@eligundry
Copy link
Contributor

@kennethreitz I have this super close, but would love to get some feedback regarding assumptions of the datatypes that this library works with. I noticed that Boto3 will always return bytes for all fetched operations, even if you set it with a unicode string. Ideally, I would love for file-like operations to work similarly (i.e. you give me a string in a with block to write, cool. oh, you gave me bytes, I'll still work). This works perfectly with io.BytesIO in Python 2, but in Python 3, this very much is not an option.

This test with json.dump is holding this feature up. Because json.dump only works with str and will never produce bytes, when it tries to write to io.BytesIO, it'll error out every time.

At this point, I have a few questions/ideas I'm gonna dump out here:

  1. Maybe streams aren't a good idea? I think switching to tempfile.TempFile is a hacky way to solve this, but it would work (though flushing could get messy).
  2. Have you run into an issue like this in any other of your many projects?
  3. Is assuming bytes for all file inputs a fair thing to enforce for Python 3?

@kennethreitz
Copy link
Collaborator

Hmmmmm

@kennethreitz
Copy link
Collaborator

Requests dealt with a similar issue — and there's a lot of code in place to compensate for it.

@kennethreitz
Copy link
Collaborator

assuming bytes for Python3 is sane.

@inishchith
Copy link

closing this due to inactivity

@ParthS007
Copy link

@inishchith We can keep some issue open which have some discussion else we will be left with no open issues. What say?

@inishchith inishchith reopened this Feb 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants