Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update integrated test baseline storage #14

Merged
merged 46 commits into from
May 7, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
ff14ade
Adding python scripts to manage baseline io
cssherman Mar 15, 2024
d8b0e14
Wiring in new baseline management tools
cssherman Mar 15, 2024
f3d2860
Fixing typo in geos_ats parser
cssherman Mar 15, 2024
93fcf9c
Fixing package import bug
cssherman Mar 15, 2024
dccdc28
Updating geos_ats input args, various baseline method updates
cssherman Mar 18, 2024
6e4ce12
Updating geos ats command line parsing
cssherman Mar 21, 2024
bb68813
Adding yaml path to ats environment setup
cssherman Mar 29, 2024
d9b4c11
Splitting baseline archive packing, upload
cssherman Mar 29, 2024
97f063e
Fixing bug in baseline packing
cssherman Apr 2, 2024
80ae85c
Adding https baseline fetch option
cssherman Apr 2, 2024
0bacdce
Adding options to work with baseline cache files
cssherman Apr 2, 2024
7325ae0
Fixing baseline archive names
cssherman Apr 2, 2024
0b0591c
Fixing baseline archive structure, copying log files
cssherman Apr 2, 2024
a7facad
Fixing blob download name
cssherman Apr 2, 2024
8a17acc
Handling empty directories for baseline management
cssherman Apr 10, 2024
18aa1d9
Allowing integrated tests to run without baselines
cssherman Apr 10, 2024
02ee8ab
Adding baseline management error check
cssherman Apr 10, 2024
89ff130
Adding additional logging to baseline management code
cssherman Apr 10, 2024
6e9ae4e
Adding additional logging to baseline management code
cssherman Apr 10, 2024
6c1a49d
Fixing log copying bug
cssherman Apr 11, 2024
b35d7f8
Removing test messages from geos_ats
cssherman Apr 11, 2024
49af05e
Adding simple log check script
cssherman Apr 11, 2024
fead5c9
Updating log checker for geos ats
cssherman Apr 11, 2024
2ceda4f
Fixing log checker script
cssherman Apr 11, 2024
683f82b
Fixing log check script
cssherman Apr 12, 2024
b7da058
Attempting to use an anonymous gcp client for baseline fetch
cssherman Apr 12, 2024
070fc12
Fixing log check, allowing baselines to be packed to various folders
cssherman Apr 12, 2024
cbed575
Fixing geos ats blob name
cssherman Apr 12, 2024
37bcbfb
Adding whitelist for geos ats log check script
cssherman Apr 15, 2024
f752546
Adding yaml input option to geos_ats log checker
cssherman Apr 15, 2024
a4b40a7
Using relative file paths for geos_ats html logs
cssherman Apr 15, 2024
63d9624
Updating ats html table format
cssherman Apr 16, 2024
f154225
Updating html report style
cssherman Apr 17, 2024
107bebc
Removing auto page refresh from geos ats report
cssherman Apr 17, 2024
3e74652
Updating html report style
cssherman Apr 19, 2024
0f0e48e
Updating html report style
cssherman Apr 19, 2024
8669589
Adding additional assets to html report
cssherman Apr 19, 2024
64a5f4c
Fixing report label
cssherman Apr 20, 2024
a33c9d5
Fixing lightbox settings
cssherman Apr 20, 2024
a2d4194
Grouping lightbox captions
cssherman Apr 20, 2024
ab0bc6b
Adding baseline history file
cssherman Apr 22, 2024
97f7d41
Fixing the baseline log path
cssherman Apr 22, 2024
8d7c3dc
Separating logs from archives by default
cssherman Apr 29, 2024
cad1f59
Fixing parsing of geos ats options
cssherman Apr 30, 2024
218af63
Skipping baseline management for some test actions
cssherman Apr 30, 2024
b82a757
Adding an additional prerequisite to geos_ats
cssherman May 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
290 changes: 290 additions & 0 deletions geos_ats_package/geos_ats/baseline_io.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,290 @@
import os
import logging
import tempfile
import shutil
import yaml
import time
import requests
import pathlib
from functools import partial
from tqdm.auto import tqdm
from google.cloud import storage

logger = logging.getLogger( 'geos_ats' )
tmpdir = tempfile.TemporaryDirectory()
baseline_temporary_directory = tmpdir.name


def file_download_progress( headers: dict, url: str, filename: str ):
"""
Download a file from a url in chunks, showing a progress bar

Args:
headers (dict): Request headers
url (str): Target address
filename (str): Download filename
"""

path = pathlib.Path( filename ).expanduser().resolve()
path.parent.mkdir( parents=True, exist_ok=True )

r = requests.get( url, stream=True, allow_redirects=True, headers=headers )
if r.status_code != 200:
Copy link
Contributor

@TotoGaz TotoGaz Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if r.status_code != 200:
if not r.ok:

? I'm not sure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this supposed to be changed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot about that one. It may need some more testing, so let's save it for the next upgrade

r.raise_for_status()
raise RuntimeError( f"Request to {url} returned status code {r.status_code}" )

file_size = int( r.headers.get( 'Content-Length', 0 ) )
desc = "(Unknown total file size)" if file_size == 0 else ""

try:
r.raw.read = partial( r.raw.read, decode_content=True )
with tqdm.wrapattr( r.raw, "read", total=file_size, desc=desc ) as r_raw:
with path.open( "wb" ) as f:
shutil.copyfileobj( r_raw, f )

except:
with path.open( "wb" ) as f:
for chunk in r.iter_content( chunk_size=128 ):
f.write( chunk )
Comment on lines +39 to +48
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand the need of this try/expect. Could you elaborate?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to be a fallback method in case that tqdm library fails to download the file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More or less. On some systems, the default method can run into issues, and this is a fall-back that seems to be quite reliable.



def collect_baselines( bucket_name: str,
blob_name: str,
baseline_path: str,
force_redownload: bool = False,
ok_delete_old_baselines: bool = False,
cache_directory: str = '' ):
"""
Collect and unpack test baselines

Args:
bucket_name (str): Name of the GCP bucket
blob_name (str): Name of the baseline blob
baseline_path (str): Path to unpack the baselines
force_redownload (bool): Force re-download baseline files
ok_delete_old_baselines (bool): Automatically delete old baseline files if present
cache_directory (str): Search this directory first for files that are already downloaded
"""
# Setup
baseline_path = os.path.abspath( os.path.expanduser( baseline_path ) )
status_path = os.path.join( baseline_path, '.blob_name' )
cache_directory = os.path.abspath( os.path.expanduser( cache_directory ) )

# Check to see if the baselines are already downloaded
logger.info( 'Checking for existing baseline files...' )
if os.path.isdir( baseline_path ):
logger.info( f'Target baseline directory already exists: {baseline_path}' )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disputable, but maybe

if not os.path.isdir( baseline_path ):
    os.makedirs( os.path.dirname( baseline_path ), exist_ok=True )
else:
    logger.info( f'Target baseline directory already exists: {baseline_path}' )
    ...

would remove the one line else at the very end (which comes as a surprise)?

if os.path.isfile( status_path ):
last_blob_name = open( status_path, 'r' ).read()
if ( blob_name == last_blob_name ) and not force_redownload:
logger.info( f'Target baselines are already downloaded: {blob_name}' )
logger.info( 'To re-download these files, run with the force_redownload option' )
return

if not ok_delete_old_baselines:
for ii in range( 10 ):
print( f'Existing baseline files found: {baseline_path}' )
user_input = input( 'Delete old baselines? [y/n]' )
user_input = user_input.strip().lower()
if user_input in [ "y", "yes" ]:
logger.debug( 'User chose to delete old baselines' )
break
elif user_input in [ "n", "no" ]:
logger.debug( 'User chose to keep old baselines' )
logger.warning( 'Running with out of date baseline files' )
return
else:
print( f'Unrecognized option: {user_input}' )
raise Exception( 'Failed to parse user options for old baselines' )

logger.info( 'Deleting old baselines...' )
shutil.rmtree( baseline_path )
cssherman marked this conversation as resolved.
Show resolved Hide resolved

else:
os.makedirs( os.path.dirname( baseline_path ), exist_ok=True )

# Check for old baselines
archive_name = ''
blob_tar = f'{blob_name}.tar.gz'
if cache_directory and not force_redownload:
logger.info( 'Checking cache directory for existing baseline...' )
f = os.path.join( cache_directory, blob_tar )
if os.path.isfile( f ):
logger.info( 'Baseline found!' )
archive_name = f

# Download new baselines
if not archive_name:
logger.info( 'Downloading baselines...' )
if cache_directory:
archive_name = os.path.join( cache_directory, blob_tar )
else:
archive_name = os.path.join( baseline_temporary_directory, blob_tar )

if 'https://' in bucket_name:
# Download from URL
try:
file_download_progress( {}, f"{bucket_name}/{blob_tar}", archive_name )
except Exception as e:
logger.error( f'Failed to download baseline from URL ({bucket_name}/{blob_tar})' )
logger.error( str( e ) )
else:
# Download from GCP
try:
client = storage.Client( use_auth_w_custom_endpoint=False )
bucket = client.bucket( bucket_name )
blob = bucket.blob( blob_tar )
blob.download_to_filename( archive_name )
except Exception as e:
logger.error( f'Failed to download baseline from GCP ({bucket_name}/{blob_tar})' )
logger.error( str( e ) )

if os.path.isfile( archive_name ):
# Unpack new baselines
logger.info( f'Unpacking baselines: {archive_name}' )
try:
shutil.unpack_archive( archive_name, baseline_path, format='gztar' )
logger.info( 'Finished fetching baselines!' )
except Exception as e:
logger.error( str( e ) )
raise Exception( f'Failed to unpack baselines: {archive_name}' )

else:
logger.error( str( e ) )
raise Exception( f'Could not find baseline files to unpack: expected={archive_name}' )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if os.path.isfile( archive_name ):
# Unpack new baselines
logger.info( f'Unpacking baselines: {archive_name}' )
try:
shutil.unpack_archive( archive_name, baseline_path, format='gztar' )
logger.info( 'Finished fetching baselines!' )
except Exception as e:
logger.error( str( e ) )
raise Exception( f'Failed to unpack baselines: {archive_name}' )
else:
logger.error( str( e ) )
raise Exception( f'Could not find baseline files to unpack: expected={archive_name}' )
if not os.path.isfile( archive_name ):
logger.error( str( e ) )
raise Exception( f'Could not find baseline files to unpack: expected={archive_name}' )
# Unpack new baselines
try:
logger.info( f'Unpacking baselines: {archive_name}' )
shutil.unpack_archive( archive_name, baseline_path, format='gztar' )
logger.info( 'Finished fetching baselines!' )
except Exception as e:
logger.error( str( e ) )
raise Exception( f'Failed to unpack baselines: {archive_name}' )

again, disputable



def pack_baselines( archive_name: str, baseline_path: str, log_path: str = '' ):
"""
Pack and upload baselines to GCP

Args:
archive_name (str): Name of the target archive
baseline_path (str): Path to unpack the baselines
log_path (str): Path to log files (optional)
"""
# Setup
archive_name = os.path.abspath( archive_name )
baseline_path = os.path.abspath( os.path.expanduser( baseline_path ) )
status_path = os.path.join( baseline_path, '.blob_name' )

# Check to see if the baselines are already downloaded
logger.info( 'Checking for existing baseline files...' )
if not os.path.isdir( baseline_path ):
logger.error( f'Could not find target baselines: {baseline_path}' )
raise FileNotFoundError( 'Could not find target baseline files' )

# Update the blob name
with open( status_path, 'w' ) as f:
f.write( os.path.basename( archive_name ) )

# Copy the log directory
if os.path.isdir( log_path ):
shutil.rmtree( log_path )
if log_path:
log_path = os.path.abspath( os.path.expanduser( log_path ) )
log_target = os.path.join( baseline_path, 'logs' )
shutil.copytree( log_path, log_target )

try:
logger.info( 'Archiving baseline files...' )
shutil.make_archive( archive_name, format='gztar', root_dir=baseline_path )
logger.info( f'Created {archive_name}.tar.gz' )
except Exception as e:
logger.error( 'Failed to create baseline archive' )
logger.error( str( e ) )


def upload_baselines( bucket_name: str, archive_name: str ):
"""
Pack and upload baselines to GCP

Args:
bucket_name (str): Name of the GCP bucket
archive_name (str): Name of the target archive
"""
# Setup
if not os.path.isfile( archive_name ):
logger.error( f'Could not find target archive:{archive_name}' )
return

try:
logger.info( 'Uploading baseline files...' )
client = storage.Client()
bucket = client.bucket( bucket_name )
blob = bucket.blob( os.path.basename( archive_name ) )
blob.upload_from_filename( archive_name, if_generation_match=0 )
logger.info( 'Finished uploading baselines!' )

except Exception as e:
logger.error( 'Failed to upload baselines!' )
logger.error( str( e ) )


def manage_baselines( options ):
"""
Manage the integrated test baselines
"""
# Check for integrated test yaml file
test_yaml = ''
if options.yaml:
test_yaml = options.yaml
else:
test_yaml = os.path.join( options.geos_bin_dir, '..', '..', '.integrated_tests.yaml' )

if not os.path.isfile( test_yaml ):
raise Exception( f'Could not find the integrated test yaml file: {test_yaml}' )

test_options = {}
with open( test_yaml ) as f:
test_options = yaml.safe_load( f )

baseline_options = test_options.get( 'baselines', {} )
for k in [ 'bucket', 'baseline' ]:
if k not in baseline_options:
raise Exception(
f'Required information (baselines/{k}) missing from integrated test yaml file: {test_yaml}' )

# Manage baselines
if options.action in [ 'pack_baselines', 'upload_baselines' ]:
if os.path.isdir( options.baselineDir ):
# Check the baseline name and open a temporary directory if required
upload_name = options.baselineArchiveName
if upload_name.endswith( '.tar.gz' ):
upload_name = upload_name[ :-7 ]

if not upload_name:
epoch = int( time.time() )
upload_name = os.path.join( baseline_temporary_directory, f'integrated_test_baseline_{epoch}' )
else:
dirname = os.path.dirname( upload_name )
os.makedirs( dirname, exist_ok=True )

pack_baselines( upload_name, options.baselineDir, log_path=options.logs )
if options.action == 'pack_baselines':
quit()

upload_baselines( baseline_options[ 'bucket' ], upload_name )

# Update the test config file
blob_name = os.path.basename( upload_name )
baseline_options[ 'baseline' ] = upload_name
with open( test_yaml, 'w' ) as f:
yaml.dump( baseline_options, f )
quit()
else:
raise Exception( f'Could not find the requested baselines to upload: {options.baselineDir}' )

collect_baselines( baseline_options[ 'bucket' ],
baseline_options[ 'baseline' ],
options.baselineDir,
force_redownload=options.update_baselines,
ok_delete_old_baselines=options.delete_old_baselines,
cache_directory=options.baselineCacheDirectory )

# Cleanup
if not os.path.isdir( options.baselineDir ):
raise Exception( f'Could not find the specified baseline directory: {options.baselineDir}' )

if options.action == 'download_baselines':
quit()
52 changes: 28 additions & 24 deletions geos_ats_package/geos_ats/command_line_parsers.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
import logging
import argparse
import os
import shutil
from pydoc import locate

action_options = {
"run": "execute the test cases that previously did not pass.",
Expand All @@ -17,6 +14,9 @@
"rebaseline": "rebaseline the testcases from a previous run.",
"rebaselinefailed": "rebaseline only failed testcases from a previous run.",
"report": "generate a text or html report, see config for the reporting options.",
"pack_baselines": "Pack baselines into archive",
"upload_baselines": "Upload baselines to bucket",
"download_baselines": "Download baselines from bucket",
}

check_options = {
Expand Down Expand Up @@ -46,6 +46,23 @@ def build_command_line_parser():

parser.add_argument( "-b", "--baselineDir", type=str, help="Root baseline directory" )

parser.add_argument( "-y", "--yaml", type=str, help="Path to YAML config file", default='' )

parser.add_argument( "--baselineArchiveName", type=str, help="Baseline archive name", default='' )
parser.add_argument( "--baselineCacheDirectory", type=str, help="Baseline cache directory", default='' )

parser.add_argument( "-d",
"--delete-old-baselines",
action="store_true",
default=False,
help="Automatically delete old baselines" )

parser.add_argument( "-u",
"--update-baselines",
action="store_true",
default=False,
help="Force baseline file update" )

action_names = ','.join( action_options.keys() )
parser.add_argument( "-a", "--action", type=str, default="run", help=f"Test actions options ({action_names})" )

Expand All @@ -59,14 +76,6 @@ def build_command_line_parser():
default="info",
help=f"Log verbosity options ({verbosity_names})" )

parser.add_argument( "-d",
"--detail",
action="store_true",
default=False,
help="Show detailed action/check options" )

parser.add_argument( "-i", "--info", action="store_true", default=False, help="Info on various topics" )

parser.add_argument( "-r",
"--restartCheckOverrides",
nargs='+',
Expand Down Expand Up @@ -115,17 +124,20 @@ def parse_command_line_arguments( args ):
# Check action, check, verbosity items
check = options.check
if check not in check_options:
print(
f"Selected check option ({check}) not recognized. Try running with --help/--details for more information" )
print( f"Selected check option ({check}) not recognized" )
exit_flag = True

action = options.action
if action not in action_options:
print(
f"Selected action option ({action}) not recognized. Try running with --help/--details for more information"
)
print( f"Selected action option ({action}) not recognized" )
exit_flag = True

if exit_flag:
for option_type, details in ( 'action', action_options ), ( 'check', check_options ):
print( f'\nAvailable {option_type} options:' )
for k, v in details.items():
print( f' {k}: {v}' )

verbose = options.verbose
if verbose not in verbose_options:
print( f"Selected verbose option ({verbose}) not recognized" )
Expand All @@ -138,14 +150,6 @@ def parse_command_line_arguments( args ):
if not options.baselineDir:
options.baselineDir = options.workingDir

# Print detailed information
if options.detail:
for option_type, details in zip( [ 'action', 'check' ], [ action_options, check_options ] ):
print( f'\nAvailable {option_type} options:' )
for k, v in details.items():
print( f' {k}: {v}' )
exit_flag = True

if exit_flag:
quit()

Expand Down
Loading
Loading