Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge from CTuning (More info added for MLPerf results) #1107

Merged
merged 39 commits into from
Feb 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
4b390fc
Fixes for intel-gptj
arjunsuresh Feb 10, 2024
0d025ca
Not use cuda for pytorch cpu build
arjunsuresh Feb 10, 2024
18bc3d0
Improvements for intel mlperf docker run
arjunsuresh Feb 10, 2024
aa65e08
improving gui to reproduce benchmarks
gfursin Feb 11, 2024
fc2cc88
started working on test table / matrix
gfursin Feb 11, 2024
a9e96bd
Merge branch 'mlcommons:master' into master
arjunsuresh Feb 11, 2024
9b7abba
adding table with badges for CM tests
gfursin Feb 11, 2024
b4201b3
started cleaning up MLPerf docs
gfursin Feb 12, 2024
a9e3835
added compute visualization
gfursin Feb 12, 2024
c1d4e7e
cleaned platform/playground docs
gfursin Feb 12, 2024
bc49844
Fixes for intel gptj
arjunsuresh Feb 12, 2024
8e84721
fixed get openimages on Windows: https://github.com/mlcommons/ck/issu…
gfursin Feb 12, 2024
fdc12b0
Merge branch 'master' of https://github.com/ctuning/mlcommons-ck
gfursin Feb 12, 2024
9c09bc8
fixed mlperf retinanet example
gfursin Feb 12, 2024
7b1ef52
clean up
gfursin Feb 12, 2024
c7d8dc5
Support version info dump
arjunsuresh Feb 12, 2024
77588f4
Merge branch 'mlcommons:master' into master
gfursin Feb 12, 2024
9e2e0bc
added host info in MLPerf result readmes: https://github.com/mlcommon…
gfursin Feb 12, 2024
578f2bc
Merge branch 'master' of https://github.com/ctuning/mlcommons-ck
gfursin Feb 12, 2024
f054514
Support version dump
arjunsuresh Feb 12, 2024
50208da
Support version_dump for all mlperf inference implementations
arjunsuresh Feb 12, 2024
166b3c2
Added dump pip version script
arjunsuresh Feb 12, 2024
118894f
Dump os,cpu and pip info for mlperf-inference
arjunsuresh Feb 12, 2024
dafca91
Copies version,os,cpu and pip info files to mlperf inference submission
arjunsuresh Feb 12, 2024
69dc582
Fix the mlperf submission generation for log files
arjunsuresh Feb 12, 2024
c239e71
Save performance and accuracy console logs for mlperf inference runs
arjunsuresh Feb 12, 2024
48705e1
added CM repo git hash in auto-generated MLPerf readme
gfursin Feb 12, 2024
6664943
Merge branch 'master' of https://github.com/ctuning/mlcommons-ck
gfursin Feb 12, 2024
5fc7a5c
add clean CM cache instruction
gfursin Feb 12, 2024
4846e3b
Add the missed dump.py script
arjunsuresh Feb 12, 2024
a7b4b32
fixed git revision detection
gfursin Feb 12, 2024
ae97ffe
Merge branch 'master' of https://github.com/ctuning/mlcommons-ck
gfursin Feb 12, 2024
341d320
fixed pip freeze on windows
gfursin Feb 12, 2024
d1e7803
Improve the measurement readme generation
arjunsuresh Feb 12, 2024
1f53939
Fixed cm_version_info dump - added script variations, seperated mlper…
arjunsuresh Feb 13, 2024
bd5590b
added extra notes about submission and info about inference/power Git…
gfursin Feb 13, 2024
2cbf7a7
Added script to dump mlperf-run-state, use cache for mlperf results, …
arjunsuresh Feb 13, 2024
9a1f056
Support git hash export for mlperf inference and power-dev repos
arjunsuresh Feb 13, 2024
0c5ff5b
Removed stale run files
arjunsuresh Feb 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 0 additions & 37 deletions cm-mlops/automation/list_of_scripts.md

This file was deleted.

53 changes: 38 additions & 15 deletions cm-mlops/automation/script/module.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def __init__(self, cmind, automation_file):
self.run_state['deps'] = []
self.run_state['fake_deps'] = False
self.run_state['parent'] = None
self.run_state['version_info'] = {}
self.run_state['version_info'] = []

self.file_with_cached_state = 'cm-cached-state.json'

Expand Down Expand Up @@ -289,8 +289,8 @@ def run(self, i):
if fake_deps: env['CM_TMP_FAKE_DEPS']='yes'

run_state = i.get('run_state', self.run_state)
if run_state.get('version_info', '') == '':
run_state['version_info'] = {}
if not run_state.get('version_info', []):
run_state['version_info'] = []
if run_state.get('parent', '') == '':
run_state['parent'] = None
if fake_deps:
Expand Down Expand Up @@ -643,7 +643,9 @@ def run(self, i):
if i.get('help',False):
return utils.call_internal_module(self, __file__, 'module_help', 'print_help', {'meta':meta, 'path':path})


run_state['script_id'] = meta['alias'] + "," + meta['uid']
run_state['script_variation_tags'] = variation_tags

deps = meta.get('deps',[])
post_deps = meta.get('post_deps',[])
prehook_deps = meta.get('prehook_deps',[])
Expand Down Expand Up @@ -1314,6 +1316,8 @@ def run(self, i):
utils.merge_dicts({'dict1':env, 'dict2':const, 'append_lists':True, 'append_unique':True})
utils.merge_dicts({'dict1':state, 'dict2':const_state, 'append_lists':True, 'append_unique':True})

run_script_input['run_state'] = run_state

ii = copy.deepcopy(customize_common_input)
ii['env'] = env
ii['state'] = state
Expand Down Expand Up @@ -1582,22 +1586,26 @@ def run(self, i):

if not version and detected_version:
version = detected_version

if version:
script_uid = script_artifact.meta.get('uid')
script_alias = script_artifact.meta.get('alias')
script_tags = script_artifact.meta.get('tags')
tags = i.get('tags')
run_state['version_info'][script_uid] = {}
run_state['version_info'][script_uid]['alias'] = script_alias
run_state['version_info'][script_uid]['script_tags'] = script_tags
run_state['version_info'][script_uid]['variation_tags'] = variation_tags
run_state['version_info'][script_uid]['version'] = version

version_info = {}
version_info_tags = ",".join(script_tags + variation_tags)
version_info[version_info_tags] = {}
version_info[version_info_tags]['script_uid'] = script_uid
version_info[version_info_tags]['script_alias'] = script_alias
version_info[version_info_tags]['version'] = version
version_info[version_info_tags]['parent'] = run_state['parent']
run_state['version_info'].append(version_info)
script_versions = detected_versions.get(meta['uid'], [])
if not script_versions:
detected_versions[meta['uid']] = [ version ]
else:
script_versions.append(version)
else:
pass # these scripts don't have versions. Should we use cm mlops version here?

############################# RETURN
elapsed_time = time.time() - start_time
Expand All @@ -1617,6 +1625,11 @@ def run(self, i):
with open('readme.md', 'w') as f:
f.write(readme)

if i.get('dump_version_info'):
r = self._dump_version_info_for_script()
if r['return'] > 0:
return r

rr = {'return':0, 'env':env, 'new_env':new_env, 'state':state, 'new_state':new_state, 'deps': run_state['deps']}

if i.get('json', False) or i.get('j', False):
Expand All @@ -1631,6 +1644,12 @@ def run(self, i):

return rr

def _dump_version_info_for_script(self, output_dir = os.getcwd()):
import json
with open(os.path.join(output_dir, 'version_info.json'), 'w') as f:
f.write(json.dumps(self.run_state['version_info'], indent=2))
return {'return': 0}

def _update_state_from_variations(self, i, meta, variation_tags, variations, env, state, deps, post_deps, prehook_deps, posthook_deps, new_env_keys_from_meta, new_state_keys_from_meta, add_deps_recursive, run_state, recursion_spaces, verbose):

# Save current explicit variations
Expand Down Expand Up @@ -2686,7 +2705,9 @@ def _run_deps(self, deps, clean_env_keys_deps, env, state, const, const_state, a
tmp_run_state_deps = copy.deepcopy(run_state['deps'])
run_state['deps'] = []
tmp_parent = run_state['parent']
run_state['parent'] = self.meta['uid']
run_state['parent'] = run_state['script_id']+":"+",".join(run_state['script_variation_tags'])
tmp_script_id = run_state['script_id']
tmp_script_variation_tags = run_state['script_variation_tags']

# Run collective script via CM API:
# Not very efficient but allows logging - can be optimized later
Expand Down Expand Up @@ -2722,12 +2743,13 @@ def _run_deps(self, deps, clean_env_keys_deps, env, state, const, const_state, a

run_state['deps'] = tmp_run_state_deps
run_state['parent'] = tmp_parent
run_state['script_id'] = tmp_script_id
run_state['script_variation_tags'] = tmp_script_variation_tags

# Restore local env
env.update(tmp_env)
update_env_with_values(env)


return {'return': 0}

##############################################################################
Expand Down Expand Up @@ -3974,6 +3996,8 @@ def prepare_and_run_script_with_postprocessing(i, postprocess="postprocess"):
verbose = i.get('verbose', False)
if not verbose: verbose = i.get('v', False)

show_time = i.get('time', False)

recursion = i.get('recursion', False)
found_script_tags = i.get('found_script_tags', [])
debug_script_tags = i.get('debug_script_tags', '')
Expand Down Expand Up @@ -4143,10 +4167,9 @@ def prepare_and_run_script_with_postprocessing(i, postprocess="postprocess"):
if customize_code is not None:
print (recursion_spaces+' ! call "{}" from {}'.format(postprocess, customize_code.__file__))


if len(posthook_deps)>0 and (postprocess == "postprocess"):
r = script_automation._call_run_deps(posthook_deps, local_env_keys, local_env_keys_from_meta, env, state, const, const_state,
add_deps_recursive, recursion_spaces, remembered_selections, variation_tags_string, found_cached, debug_script_tags, verbose, run_state)
add_deps_recursive, recursion_spaces, remembered_selections, variation_tags_string, found_cached, debug_script_tags, verbose, show_time, ' ', run_state)
if r['return']>0: return r

if (postprocess == "postprocess") and customize_code is not None and 'postprocess' in dir(customize_code):
Expand Down
78 changes: 78 additions & 0 deletions cm-mlops/automation/utils/module.py
Original file line number Diff line number Diff line change
Expand Up @@ -878,3 +878,81 @@ def uid(self, i):

return r


##############################################################################
def system(self, i):
"""
Run system command and redirect output to string.

Args:
(CM input dict):

* cmd (str): command line
* (path) (str): go to this directory and return back to current
* (stdout) (str): stdout file
* (stderr) (str): stderr file

Returns:
(CM return dict):

* return (int): return code == 0 if no error and >0 if error
* (error) (str): error string if return>0

* ret (int): return code
* std (str): stdout + stderr
* stdout (str): stdout
* stderr (str): stderr
"""

cmd = i['cmd']

if cmd == '':
return {'return':1, 'error': 'cmd is empty'}

path = i.get('path','')
if path!='' and os.path.isdir(path):
cur_dir = os.getcwd()
os.chdir(path)

if i.get('stdout','')!='':
fn1=i['stdout']
fn1_delete = False
else:
r = utils.gen_tmp_file({})
if r['return'] > 0: return r
fn1 = r['file_name']
fn1_delete = True

if i.get('stderr','')!='':
fn2=i['stderr']
fn2_delete = False
else:
r = utils.gen_tmp_file({})
if r['return'] > 0: return r
fn2 = r['file_name']
fn2_delete = True

cmd += ' > '+fn1 + ' 2> '+fn2
rx = os.system(cmd)

std = ''
stdout = ''
stderr = ''

if os.path.isfile(fn1):
r = utils.load_txt(file_name = fn1, remove_after_read = fn1_delete)
if r['return'] == 0: stdout = r['string'].strip()

if os.path.isfile(fn2):
r = utils.load_txt(file_name = fn2, remove_after_read = fn2_delete)
if r['return'] == 0: stderr = r['string'].strip()

std = stdout
if stderr!='':
if std!='': std+='\n'
std+=stderr

if path!='' and os.path.isdir(path):
os.chdir(cur_dir)

return {'return':0, 'ret':rx, 'stdout':stdout, 'stderr':stderr, 'std':std}
3 changes: 2 additions & 1 deletion cm-mlops/cfg/benchmark-hardware-compute/amd-gpu.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
{
"uid": "d8f06040f7294319",
"name": "AMD GPU"
"name": "AMD GPU",
"tags": "gpu,amd"
}
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
{
"uid":"357a972e79614903",
"name": "Generic CPU - Arm64"
"name": "Generic CPU - Arm64",
"tags": "cpu,arm64,generic"
}
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
{
"uid": "cdfd424c32734e38",
"name": "Generic CPU - x64"
"name": "Generic CPU - x64",
"tags": "cpu,x64,generic"
}
3 changes: 2 additions & 1 deletion cm-mlops/cfg/benchmark-hardware-compute/google-tpu.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
{
"uid": "b3be7ac9ef954f5a",
"name": "Google TPU"
"name": "Google TPU",
"tags": "tpu,google"
}
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
{
"uid": "fe379ecd1e054a00",
"name": "Nvidia GPU - Jetson Orin"
"name": "Nvidia GPU - Jetson Orin",
"tags": "gpu,nvidia,jetson,orin"
}
3 changes: 2 additions & 1 deletion cm-mlops/cfg/benchmark-hardware-compute/nvidia-gpu.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
{
"uid": "fe379ecd1e054a00",
"name": "Nvidia GPU"
"name": "Nvidia GPU",
"tags": "gpu,nvidia"
}
5 changes: 3 additions & 2 deletions cm-mlops/cfg/benchmark-hardware-compute/qualcomm-ai100.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
{
"uid": "fe379ecd1e054a00",
"name": "Qualcomm - AI 100"
"uid": "d2ae645066664463",
"name": "Qualcomm - AI 100",
"tags": "accelerator,acc,qualcomm,ai,100,ai-100"
}

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
uid: 125abafe58dc4473

name: "Any model - x64 - offline"

compute_uid: cdfd424c32734e38
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,4 @@ uid: db45dcd686854602

name: "Any model - offline"

supported_compute:
- cdfd424c32734e38
- 357a972e79614903
compute_uid: cdfd424c32734e38
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,4 @@ uid: "fe379ecd1e054a00"

name: "RetinaNet Reference Python Torch Offline"

supported_compute:
- cdfd424c32734e38
compute_uid: cdfd424c32734e38
9 changes: 9 additions & 0 deletions cm-mlops/cfg/benchmark-run-mlperf-inference-latest/_cm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ name: "MLPerf inference - latest"
supported_compute:
- 357a972e79614903
- cdfd424c32734e38
- d2ae645066664463

urls:
- name: "Official page"
Expand All @@ -24,3 +25,11 @@ urls:
url: "https://github.com/mlcommons/inference"
- name: "MLCommons CM automation (under development)"
url: "https://github.com/mlcommons/ck/tree/master/docs/mlperf/inference"

dimensions:
- - input.model
- "MLPerf model"
- - input.implementation
- "MLPerf implementation"
- - input.framework
- "MLPerf framework"
26 changes: 26 additions & 0 deletions cm-mlops/cfg/benchmark-run-mlperf-inference-scc23/_cm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
alias: benchmark-run-mlperf-inference-scc23
uid: 9133e5b1dddc4e4a

automation_alias: cfg
automation_uid: 88dce9c160324c5d

tags:
- benchmark
- run
- mlperf
- inference
- v3.1

name: "MLPerf inference - Student Cluster Competition 2023"

supported_compute:
- fe379ecd1e054a00
- cdfd424c32734e38
- fe379ecd1e054a00
- d2ae645066664463

urls:
- name: "Official page"
url: "https://sc23.supercomputing.org/students/student-cluster-competition/"
- name: "Tutorial to run MLPerf inference benchmark "
url: "https://github.com/mlcommons/ck/blob/master/docs/tutorials/scc23-mlperf-inference-bert.md"
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
name: "BASE"

tags: "base"
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
TBD
Loading
Loading