(shortfin-sd) Adds tooling for performance measurement. #380

monorimet · 2024-10-30T19:15:29Z

Output example:

[2024-10-30 20:12:46.189] [info] [manager.py:39] Created local system with ['amdgpu:0:0@0'] devices
[2024-10-30 20:12:46.543] [info] [service.py:99] Loading parameter fiber 'model' from: weights/sdxl_clip_fp16.irpa
[2024-10-30 20:12:46.544] [info] [service.py:99] Loading parameter fiber 'model' from: weights/stable_diffusion_xl_base_1_0_punet_dataset_i8.irpa
[2024-10-30 20:12:46.545] [info] [service.py:99] Loading parameter fiber 'model' from: weights/stable_diffusion_xl_base_1_0_vae_fp16.safetensors
[2024-10-30 20:12:46.556] [info] [server.py:82] Started server process [1905777]
[2024-10-30 20:12:46.556] [info] [on.py:48] Waiting for application startup.
[2024-10-30 20:12:46.556] [info] [manager.py:47] Starting system manager
[2024-10-30 20:12:46.557] [info] [server.py:44] Initializing service 'sd':
[2024-10-30 20:12:46.557] [info] [server.py:45] ServiceManager(
  INFERENCE DEVICES : 
     [Device(name='amdgpu:0:0@0', ordinal=0:0, node_affinity=0, capabilities=0x0)]

  MODEL PARAMS : 
     base model : SDXL 
     output size (H,W) : [[1024, 1024]] 
     max token sequence length : 64 
     classifier free guidance : True 

  SERVICE PARAMS : 
     fibers per device : 1
     program isolation mode : ProgramIsolation.PER_FIBER

  INFERENCE MODULES : 
     clip : [ProgramModule('compiled_clip', version=0, exports=[encode_prompts$async(0rrrrrr_rr), encode_prompts(0rrrr_rr), __init(0v_v)])]
     unet : [ProgramModule('compiled_punet', version=0, exports=[main$async(0rrrrrrrr_r), main(0rrrrrr_r), __init(0v_v)])]
     scheduler : [ProgramModule('compiled_scheduler', version=0, exports=[run_initialize$async(0rrrr_rrrr), run_initialize(0rr_rrrr), run_scale$async(0rrrrrr_rrrr), run_scale(0rrrr_rrrr), run_step$async(0rrrrrr_r), run_step(0rrrr_r), __init(0v_v)])]
     vae : [ProgramModule('compiled_vae', version=0, exports=[decode$async(0rrr_r), decode(0r_r), __init(0v_v)])]

  INFERENCE PARAMETERS : 
     clip : [<_shortfin_default.lib.local.StaticProgramParameters object at 0x7fdb5af946f0>]
     unet : [<_shortfin_default.lib.local.StaticProgramParameters object at 0x7fdb5971f930>]
     vae : [<_shortfin_default.lib.local.StaticProgramParameters object at 0x7fdb5971fab0>]
)
[2024-10-30 20:12:47.979] [info] [on.py:62] Application startup complete.
[2024-10-30 20:12:47.981] [info] [server.py:214] Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
[2024-10-30 20:14:18.253] [info] [metrics.py:33] Completed InferenceExecutorProcess.run in 1617ms
[2024-10-30 20:14:19.812] [info] [metrics.py:33] Completed InferenceExecutorProcess.run in 1548ms
[2024-10-30 20:14:21.391] [info] [metrics.py:33] Completed InferenceExecutorProcess.run in 1540ms
[2024-10-30 20:14:22.978] [info] [metrics.py:33] Completed InferenceExecutorProcess.run in 1540ms
[2024-10-30 20:14:24.564] [info] [metrics.py:33] Completed InferenceExecutorProcess.run in 1538ms
[2024-10-30 20:14:26.181] [info] [metrics.py:33] Completed InferenceExecutorProcess.run in 1570ms
[2024-10-30 20:14:27.786] [info] [metrics.py:33] Completed InferenceExecutorProcess.run in 1585ms
[2024-10-30 20:14:29.346] [info] [metrics.py:33] Completed InferenceExecutorProcess.run in 1545ms
[2024-10-30 20:14:29.412] [info] [metrics.py:33] Completed ClientGenerateBatchProcess.run in 12778ms
[2024-10-30 20:14:29.412] [info] [metrics.py:36] SAMPLES PER SECOND = 0.6260551530808658
[2024-10-30 20:14:29.412] [info] [h11_impl.py:476] 127.0.0.1:46188 - "POST /generate HTTP/1.1" 200

stellaraccident

Suggest upgrading that decorator to take kwargs telling it what to vs sniffing qualname

stellaraccident · 2024-10-31T01:51:09Z

shortfin/python/shortfin_apps/sd/components/metrics.py

+        ret = await fn(*args, **kwargs)
+        duration_str = get_duration_str(start)
+        logger.info(f"Completed {fn.__qualname__} in {duration_str}")
+        if fn.__qualname__ == "ClientGenerateBatchProcess.run":


What's going on with this? Seems like you should just have a kwarg on the decorator to tell it what to do. There's a standard (but tricky) idiom for that using functools.partial...

Thanks for the pointer -- I didn't end up using partial here but adapted to take a few kwargs. There's still some yuck there where we ping attributes of the decorated method's class to figure out batch size, but still more flexible than before. I may revisit later.

eagarvey-amd and others added 6 commits October 30, 2024 10:50

Add basic perf measurements

23dc337

Improvements to logging/metrics reporting

ac001a9

Remove extra newline

8a3476b

Rename utils -> metrics

fdb50d1

fixup indents in prints

d8ff71a

Merge branch 'main' into sdxl-perf

d32ceb6

stellaraccident approved these changes Oct 31, 2024

View reviewed changes

eagarvey-amd and others added 2 commits October 31, 2024 10:04

Update decorator implementation.

e49ee52

Merge branch 'main' into sdxl-perf

4aab75a

monorimet merged commit 6c7d4a4 into main Oct 31, 2024
11 checks passed

monorimet deleted the sdxl-perf branch October 31, 2024 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(shortfin-sd) Adds tooling for performance measurement. #380

(shortfin-sd) Adds tooling for performance measurement. #380

monorimet commented Oct 30, 2024 •

edited

Loading

stellaraccident left a comment

stellaraccident Oct 31, 2024

monorimet Oct 31, 2024 •

edited

Loading

(shortfin-sd) Adds tooling for performance measurement. #380

(shortfin-sd) Adds tooling for performance measurement. #380

Conversation

monorimet commented Oct 30, 2024 • edited Loading

stellaraccident left a comment

Choose a reason for hiding this comment

stellaraccident Oct 31, 2024

Choose a reason for hiding this comment

monorimet Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

monorimet commented Oct 30, 2024 •

edited

Loading

monorimet Oct 31, 2024 •

edited

Loading