Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update integrated test baseline storage #14

Merged
merged 46 commits into from
May 7, 2024

Conversation

cssherman
Copy link
Collaborator

@cssherman cssherman commented Mar 15, 2024

@cssherman cssherman requested a review from TotoGaz March 15, 2024 17:49
@cssherman
Copy link
Collaborator Author

@TotoGaz - any thoughts on this approach for managing test baselines? Reading the docs, these google-cloud python API calls can accept credentials from the user's environment, which is consistent with how you are currently uploading test artefacts on the CI.

Also, we'll need to work out access details for manual updates, testing, etc.

@cssherman
Copy link
Collaborator Author

I'm currently relying on reading/writing to yaml file to set the latest version of the baselines. An example of one is in the linked PR, here: https://github.com/GEOS-DEV/GEOS/pull/3044/files#diff-6daa8e7f57a28d97ee7ea9fe3b51a6cbbcb58a59633488c885e402ded0445c2f

Comment on lines 47 to 50
client = storage.Client()
bucket = client.bucket( bucket_name )
blob = bucket.blob( blob_name )
blob.download_to_filename( archive_name )
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods are currently specific to GCP, but looking at the Azure docs, their API is very similar: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-upload-python

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, there also exists a direct link you can use as a std url.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts were that we could benefit from some of the error checking / progress messages baked into this package, rather than rolling our own. Also, I'm noticing that we should set use_auth_w_custom_endpoint=False (allows an anonymous user, see https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.client)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. We should test this is in proxy constrained environments.
I think that what can be a critical feature is how complicated (or not) it is to define a proxy (and doc on how to) with our downloading package.

@cssherman cssherman requested review from CusiniM and rrsettgast March 15, 2024 17:58
@cssherman cssherman self-assigned this Mar 15, 2024
Copy link
Contributor

@TotoGaz TotoGaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can continue this way.
Maybe do we need some time to validate?

geos_ats_package/geos_ats/baseline_io.py Outdated Show resolved Hide resolved
geos_ats_package/geos_ats/baseline_io.py Outdated Show resolved Hide resolved
geos_ats_package/geos_ats/command_line_parsers.py Outdated Show resolved Hide resolved
geos_ats_package/geos_ats/baseline_io.py Show resolved Hide resolved
geos_ats_package/geos_ats/baseline_io.py Outdated Show resolved Hide resolved
geos_ats_package/geos_ats/baseline_io.py Outdated Show resolved Hide resolved
exit_flag = True

if exit_flag:
for option_type, details in zip( [ 'action', 'check' ], [ action_options, check_options ] ):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for option_type, details in zip( [ 'action', 'check' ], [ action_options, check_options ] ):
for option_type, details in ('action', action_options), ('check', check_options ):

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does that work? I am not so sure. I think you need to make it a list of tuples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> for i, j in (0, 1), (2, 3):
...     print(i, j)
...
0 1
2 3

I guess there's some auto conversion to an Iterable on the fly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CusiniM I was surprised by this behavior myself when @TotoGaz suggested it

geos_ats_package/setup.cfg Show resolved Hide resolved
@CusiniM
Copy link
Collaborator

CusiniM commented Apr 4, 2024

The code looks fine to me but we should chat about the usage because something is unclear to me. For example, I don't fully understand the difference between deleting and updating. I guess the delete is in case someone wants to download a newer version?

And I have also missed where we are storing the hash of the baselines in the GEOS repo...And are we expecting users to upload baselines manually or is this done automatically by the CI?if done automatically how are we triggering it upon validation?

path.parent.mkdir( parents=True, exist_ok=True )

r = requests.get( url, stream=True, allow_redirects=True, headers=headers )
if r.status_code != 200:
Copy link
Contributor

@TotoGaz TotoGaz Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if r.status_code != 200:
if not r.ok:

? I'm not sure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this supposed to be changed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot about that one. It may need some more testing, so let's save it for the next upgrade

Comment on lines +39 to +48
try:
r.raw.read = partial( r.raw.read, decode_content=True )
with tqdm.wrapattr( r.raw, "read", total=file_size, desc=desc ) as r_raw:
with path.open( "wb" ) as f:
shutil.copyfileobj( r_raw, f )

except:
with path.open( "wb" ) as f:
for chunk in r.iter_content( chunk_size=128 ):
f.write( chunk )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand the need of this try/expect. Could you elaborate?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to be a fallback method in case that tqdm library fails to download the file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More or less. On some systems, the default method can run into issues, and this is a fall-back that seems to be quite reliable.

Comment on lines 75 to 76
if os.path.isdir( baseline_path ):
logger.info( f'Target baseline directory already exists: {baseline_path}' )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disputable, but maybe

if not os.path.isdir( baseline_path ):
    os.makedirs( os.path.dirname( baseline_path ), exist_ok=True )
else:
    logger.info( f'Target baseline directory already exists: {baseline_path}' )
    ...

would remove the one line else at the very end (which comes as a surprise)?

Comment on lines 142 to 154
if os.path.isfile( archive_name ):
# Unpack new baselines
logger.info( f'Unpacking baselines: {archive_name}' )
try:
shutil.unpack_archive( archive_name, baseline_path, format='gztar' )
logger.info( 'Finished fetching baselines!' )
except Exception as e:
logger.error( str( e ) )
raise Exception( f'Failed to unpack baselines: {archive_name}' )

else:
logger.error( str( e ) )
raise Exception( f'Could not find baseline files to unpack: expected={archive_name}' )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if os.path.isfile( archive_name ):
# Unpack new baselines
logger.info( f'Unpacking baselines: {archive_name}' )
try:
shutil.unpack_archive( archive_name, baseline_path, format='gztar' )
logger.info( 'Finished fetching baselines!' )
except Exception as e:
logger.error( str( e ) )
raise Exception( f'Failed to unpack baselines: {archive_name}' )
else:
logger.error( str( e ) )
raise Exception( f'Could not find baseline files to unpack: expected={archive_name}' )
if not os.path.isfile( archive_name ):
logger.error( str( e ) )
raise Exception( f'Could not find baseline files to unpack: expected={archive_name}' )
# Unpack new baselines
try:
logger.info( f'Unpacking baselines: {archive_name}' )
shutil.unpack_archive( archive_name, baseline_path, format='gztar' )
logger.info( 'Finished fetching baselines!' )
except Exception as e:
logger.error( str( e ) )
raise Exception( f'Failed to unpack baselines: {archive_name}' )

again, disputable

@CusiniM CusiniM merged commit 60511cf into main May 7, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants