Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement Request: List keys (files) in S3 bucket using paginators #9

Open
adrianyorke opened this issue Dec 12, 2019 · 20 comments
Open
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@adrianyorke
Copy link
Collaborator

I would like to connect to an existing S3 bucket and list the keys (files). This will then drive further test cases on a key-by-key basis as we fetch files in order and perform data quality checks such as row counts, hash value, average or total of numerical columns, etc..

There is a good write-up here which explains how this could be implemented using paginators. Paginators simplify paging complexities for buckets that contain large amounts of files.

@adrianyorke adrianyorke added enhancement New feature or request good first issue Good for newcomers labels Dec 12, 2019
@adrianyorke adrianyorke changed the title Enhancement Request: List keys in S3 bucket using paginators Enhancement Request: List keys (files) in S3 bucket using paginators Dec 12, 2019
@teaglebuilt
Copy link
Collaborator

this is a good place to start

@teaglebuilt
Copy link
Collaborator

if know one has claimed this issue. I will take it on

@adrianyorke
Copy link
Collaborator Author

adrianyorke commented Dec 14, 2019

Go for it @teaglebuilt. I will review/test and merge your pull request if that method works for you?

@adrianyorke
Copy link
Collaborator Author

Regarding PRs - would you prefer that we create a branch for each patch or are you happy just to work on master for now? Bigger projects (like robot framework core) normally prefer a separate branch for each patch but smaller projects can just work on master for simple fixes and enhancements.

@teaglebuilt
Copy link
Collaborator

i think we should always create pull requests.setting master as the tracking branch works instead of creating a dev branch for now. Maybe this will change when we have caught up. As far as documentation is concerned, i dont want the wait of pul request to hold back errors or incorrect documentation. I definately do not want to work off master from this point forward. If you are a collaborator, then you should be able to change incorrect information / documentation without a pull request. Like all of the markdown documentation, keyword docs, and so on

@teaglebuilt
Copy link
Collaborator

I have used robots libdoc for auto documentation, pre commit for linting, and I will set up travis ci to deploy to pypi on tag releases

@teaglebuilt teaglebuilt self-assigned this Dec 15, 2019
@teaglebuilt
Copy link
Collaborator

after starting playing around with paginators, i think we need to think about all the s3 keywords that we want to create using this. This issue keyword should list all by page, or by prefix?

preferred keyword name?
other keyword offsets? For example:

List Keys params: bucket, prefix ?

Or

List Keys Bucket
List Keys by Prefix Bucket Prefix

@adrianyorke
Copy link
Collaborator Author

List Keys is what I had in mind. Should match the boto function. One thing to consider is that there may be many 1000s of keys in a single bucket so filtering option would be useful.

@teaglebuilt
Copy link
Collaborator

alright

@NeoMorfeo
Copy link

Any update on this? required help to develop?

@adrianyorke
Copy link
Collaborator Author

@NeoMorfeo: @teaglebuilt commented 15 Dec: "if know one has claimed this issue. I will take it on".

@NeoMorfeo
Copy link

thanks @adrianyorke 😄, @teaglebuilt It will be nice to have, please ask me if required.

Also a good point to is to search/filter on the bucket, by prefix, as indicated in the other comment, for my has more sense to mimic the boto and have one Keyword with List Keys pagination, prefix and so.

Thanks in advance

@adrianyorke
Copy link
Collaborator Author

@NeoMorfeo: Contributions are most welcome and I am happy to test and review. First take a look at the Contributing guidelines: https://github.com/teaglebuilt/robotframework-aws/blob/master/CONTRIBUTING.md

Let's wait to hear from @teaglebuilt before you put too much effort into this - he may have the solution sitting some local branch so let's not waste time fixing it again until we've heard back from him.

@NeoMorfeo
Copy link

No news about this @teaglebuilt or @adrianyorke ? then maybe I will implement by myself and ask for a PR :D because i need to improve this as much as posible :=)

@teaglebuilt
Copy link
Collaborator

teaglebuilt commented Jun 24, 2020

@NeoMorfeo what are you asking. You are free to contribute and if you submitted a pull request I’ll pull it down and test it. This repo needs your help

@NeoMorfeo
Copy link

@teaglebuilt no worries, just wondering if you guys spoke about this, nothing morry, sorry for bother :(

I will make a change on the code, and will ask for PR to check if fits to your standars. Thanks!

@teaglebuilt
Copy link
Collaborator

@NeoMorfeo great i am sure it will, the only thing is we need to write a unit test and robotframework test for each keyword or modify it based on the changes made. Under the test directory there should be a folder for unit tests and RF/acceptance tests.

@NeoMorfeo
Copy link

Ok @teaglebuilt also i will follow the Contributing guidelines as @adrianyorke sugest 😄

Now i need time to implement :)

@teaglebuilt
Copy link
Collaborator

@NeoMorfeo hows it coming? any roadblocks or issues?

@NeoMorfeo
Copy link

No no, sorry, I have busy times and no time to develop this, but I'm over it... Sorry for delaying

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants