-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update integrated test baseline storage #14
Conversation
@TotoGaz - any thoughts on this approach for managing test baselines? Reading the docs, these google-cloud python API calls can accept credentials from the user's environment, which is consistent with how you are currently uploading test artefacts on the CI. Also, we'll need to work out access details for manual updates, testing, etc. |
I'm currently relying on reading/writing to yaml file to set the latest version of the baselines. An example of one is in the linked PR, here: https://github.com/GEOS-DEV/GEOS/pull/3044/files#diff-6daa8e7f57a28d97ee7ea9fe3b51a6cbbcb58a59633488c885e402ded0445c2f |
client = storage.Client() | ||
bucket = client.bucket( bucket_name ) | ||
blob = bucket.blob( blob_name ) | ||
blob.download_to_filename( archive_name ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods are currently specific to GCP, but looking at the Azure docs, their API is very similar: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-upload-python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, there also exists a direct link you can use as a std url
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thoughts were that we could benefit from some of the error checking / progress messages baked into this package, rather than rolling our own. Also, I'm noticing that we should set use_auth_w_custom_endpoint=False
(allows an anonymous user, see https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.client)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. We should test this is in proxy
constrained environments.
I think that what can be a critical feature is how complicated (or not) it is to define a proxy (and doc on how to) with our downloading package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can continue this way.
Maybe do we need some time to validate?
exit_flag = True | ||
|
||
if exit_flag: | ||
for option_type, details in zip( [ 'action', 'check' ], [ action_options, check_options ] ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for option_type, details in zip( [ 'action', 'check' ], [ action_options, check_options ] ): | |
for option_type, details in ('action', action_options), ('check', check_options ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does that work? I am not so sure. I think you need to make it a list of tuples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> for i, j in (0, 1), (2, 3):
... print(i, j)
...
0 1
2 3
I guess there's some auto conversion to an Iterable
on the fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks fine to me but we should chat about the usage because something is unclear to me. For example, I don't fully understand the difference between deleting and updating. I guess the delete is in case someone wants to download a newer version? And I have also missed where we are storing the hash of the baselines in the GEOS repo...And are we expecting users to upload baselines manually or is this done automatically by the CI?if done automatically how are we triggering it upon validation? |
path.parent.mkdir( parents=True, exist_ok=True ) | ||
|
||
r = requests.get( url, stream=True, allow_redirects=True, headers=headers ) | ||
if r.status_code != 200: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if r.status_code != 200: | |
if not r.ok: |
? I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this supposed to be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot about that one. It may need some more testing, so let's save it for the next upgrade
try: | ||
r.raw.read = partial( r.raw.read, decode_content=True ) | ||
with tqdm.wrapattr( r.raw, "read", total=file_size, desc=desc ) as r_raw: | ||
with path.open( "wb" ) as f: | ||
shutil.copyfileobj( r_raw, f ) | ||
|
||
except: | ||
with path.open( "wb" ) as f: | ||
for chunk in r.iter_content( chunk_size=128 ): | ||
f.write( chunk ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand the need of this try/expect
. Could you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to be a fallback method in case that tqdm
library fails to download the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More or less. On some systems, the default method can run into issues, and this is a fall-back that seems to be quite reliable.
if os.path.isdir( baseline_path ): | ||
logger.info( f'Target baseline directory already exists: {baseline_path}' ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disputable, but maybe
if not os.path.isdir( baseline_path ):
os.makedirs( os.path.dirname( baseline_path ), exist_ok=True )
else:
logger.info( f'Target baseline directory already exists: {baseline_path}' )
...
would remove the one line else
at the very end (which comes as a surprise)?
if os.path.isfile( archive_name ): | ||
# Unpack new baselines | ||
logger.info( f'Unpacking baselines: {archive_name}' ) | ||
try: | ||
shutil.unpack_archive( archive_name, baseline_path, format='gztar' ) | ||
logger.info( 'Finished fetching baselines!' ) | ||
except Exception as e: | ||
logger.error( str( e ) ) | ||
raise Exception( f'Failed to unpack baselines: {archive_name}' ) | ||
|
||
else: | ||
logger.error( str( e ) ) | ||
raise Exception( f'Could not find baseline files to unpack: expected={archive_name}' ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if os.path.isfile( archive_name ): | |
# Unpack new baselines | |
logger.info( f'Unpacking baselines: {archive_name}' ) | |
try: | |
shutil.unpack_archive( archive_name, baseline_path, format='gztar' ) | |
logger.info( 'Finished fetching baselines!' ) | |
except Exception as e: | |
logger.error( str( e ) ) | |
raise Exception( f'Failed to unpack baselines: {archive_name}' ) | |
else: | |
logger.error( str( e ) ) | |
raise Exception( f'Could not find baseline files to unpack: expected={archive_name}' ) | |
if not os.path.isfile( archive_name ): | |
logger.error( str( e ) ) | |
raise Exception( f'Could not find baseline files to unpack: expected={archive_name}' ) | |
# Unpack new baselines | |
try: | |
logger.info( f'Unpacking baselines: {archive_name}' ) | |
shutil.unpack_archive( archive_name, baseline_path, format='gztar' ) | |
logger.info( 'Finished fetching baselines!' ) | |
except Exception as e: | |
logger.error( str( e ) ) | |
raise Exception( f'Failed to unpack baselines: {archive_name}' ) |
again, disputable
GEOS PR: GEOS-DEV/GEOS#3044