-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add aws utils #187
base: master
Are you sure you want to change the base?
Add aws utils #187
Conversation
…s on version disabled buckets
…ed. Add mocked_boto3_object context manager in qa_utils.py to make some of those tests simpler to write. Fix some error handling in the bucket code to raise ClientError rather than just Exception.
dcicutils/aws_utils.py
Outdated
logger = logging.getLogger(__name__) | ||
|
||
|
||
def s3_bucket_head(*, bucket_name, s3=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would reorder arguments here so s3
comes first, and rename it s3_client
dcicutils/aws_utils.py
Outdated
:return: dict: head response or None | ||
""" | ||
try: | ||
s3 = s3 or s3Utils().s3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not do this this way as it will implicitly pull in a bunch of other things when calling s3Utils()
- I would just do boto3.client('s3')
. This comment applies to many functions throughout here.
dcicutils/aws_utils.py
Outdated
def s3_bucket_exists(*, bucket_name, s3=None): | ||
""" Does a bucket exist? | ||
|
||
:param bucket_name: name of the bucket - string | ||
:param s3: AWS s3 client | ||
:return: boolean - True if exists, False if not | ||
""" | ||
return bool(s3_bucket_head(bucket_name=bucket_name, s3=s3)) | ||
|
||
|
||
def s3_bucket_object_count(bucket_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're using the s3
convention inconsistently - I would recommend having it everywhere and writing a small helper that does the client initialization, if necessary.
dcicutils/aws_utils.py
Outdated
""" | ||
bucket = boto3.resource('s3').Bucket(bucket_name) | ||
# get only head of objects so we can count them | ||
return sum(1 for _ in bucket.objects.all()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work recursively or will it only count folders at top level? Thinking about how it would count things in our files/wfoutput buckets for example which all have a folder based hierarchy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something s3 does not store objects with a hierarchy (folders are just a convenience in console display?) so this would count all objects in the given bucket regardless of folder. I expect if empty folders were created then these would be included in the count so maybe checking that 0 size objects are not included may be warranted? If I am correct about this then what may indeed be useful is to support passing in a prefix and only return a count of those objects that share said prefix - for example in the dcic account in the '4dn-dcic-open-data-transfer-logs' bucket if a parameter '2022-08-01' was provided then a count of the 32 object in that folder/with that shared prefix would be returned.
dcicutils/aws_utils.py
Outdated
return bool(s3_object_head(object_key=object_key, bucket_name=bucket_name, s3=s3)) | ||
|
||
|
||
def s3_put_object(*, object_key, obj, bucket_name, acl=None, s3=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should handle the kms_key_id
case here as well, despite it not being used in Fourfront
Pull Request Test Coverage Report for Build 3275617373
💛 - Coveralls |
Adds some generally useful aws functions (currently all s3 operations) - haven't had time to complete unit tests for all functions though several functions are covered.
Perhaps similar functions may have been added to other utils packages by others?
I'd like to be able to use these functions for open data operations as I refactor some of that code. so would love to get this merged in