Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(shortfin-sd) Adds tooling for performance measurement. #380

Merged
merged 8 commits into from
Oct 31, 2024
Merged

Conversation

monorimet
Copy link
Contributor

@monorimet monorimet commented Oct 30, 2024

Output example:

[2024-10-30 20:12:46.189] [info] [manager.py:39] Created local system with ['amdgpu:0:0@0'] devices
[2024-10-30 20:12:46.543] [info] [service.py:99] Loading parameter fiber 'model' from: weights/sdxl_clip_fp16.irpa
[2024-10-30 20:12:46.544] [info] [service.py:99] Loading parameter fiber 'model' from: weights/stable_diffusion_xl_base_1_0_punet_dataset_i8.irpa
[2024-10-30 20:12:46.545] [info] [service.py:99] Loading parameter fiber 'model' from: weights/stable_diffusion_xl_base_1_0_vae_fp16.safetensors
[2024-10-30 20:12:46.556] [info] [server.py:82] Started server process [1905777]
[2024-10-30 20:12:46.556] [info] [on.py:48] Waiting for application startup.
[2024-10-30 20:12:46.556] [info] [manager.py:47] Starting system manager
[2024-10-30 20:12:46.557] [info] [server.py:44] Initializing service 'sd':
[2024-10-30 20:12:46.557] [info] [server.py:45] ServiceManager(
  INFERENCE DEVICES : 
     [Device(name='amdgpu:0:0@0', ordinal=0:0, node_affinity=0, capabilities=0x0)]

  MODEL PARAMS : 
     base model : SDXL 
     output size (H,W) : [[1024, 1024]] 
     max token sequence length : 64 
     classifier free guidance : True 

  SERVICE PARAMS : 
     fibers per device : 1
     program isolation mode : ProgramIsolation.PER_FIBER

  INFERENCE MODULES : 
     clip : [ProgramModule('compiled_clip', version=0, exports=[encode_prompts$async(0rrrrrr_rr), encode_prompts(0rrrr_rr), __init(0v_v)])]
     unet : [ProgramModule('compiled_punet', version=0, exports=[main$async(0rrrrrrrr_r), main(0rrrrrr_r), __init(0v_v)])]
     scheduler : [ProgramModule('compiled_scheduler', version=0, exports=[run_initialize$async(0rrrr_rrrr), run_initialize(0rr_rrrr), run_scale$async(0rrrrrr_rrrr), run_scale(0rrrr_rrrr), run_step$async(0rrrrrr_r), run_step(0rrrr_r), __init(0v_v)])]
     vae : [ProgramModule('compiled_vae', version=0, exports=[decode$async(0rrr_r), decode(0r_r), __init(0v_v)])]

  INFERENCE PARAMETERS : 
     clip : [<_shortfin_default.lib.local.StaticProgramParameters object at 0x7fdb5af946f0>]
     unet : [<_shortfin_default.lib.local.StaticProgramParameters object at 0x7fdb5971f930>]
     vae : [<_shortfin_default.lib.local.StaticProgramParameters object at 0x7fdb5971fab0>]
)
[2024-10-30 20:12:47.979] [info] [on.py:62] Application startup complete.
[2024-10-30 20:12:47.981] [info] [server.py:214] Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
[2024-10-30 20:14:18.253] [info] [metrics.py:33] Completed InferenceExecutorProcess.run in 1617ms
[2024-10-30 20:14:19.812] [info] [metrics.py:33] Completed InferenceExecutorProcess.run in 1548ms
[2024-10-30 20:14:21.391] [info] [metrics.py:33] Completed InferenceExecutorProcess.run in 1540ms
[2024-10-30 20:14:22.978] [info] [metrics.py:33] Completed InferenceExecutorProcess.run in 1540ms
[2024-10-30 20:14:24.564] [info] [metrics.py:33] Completed InferenceExecutorProcess.run in 1538ms
[2024-10-30 20:14:26.181] [info] [metrics.py:33] Completed InferenceExecutorProcess.run in 1570ms
[2024-10-30 20:14:27.786] [info] [metrics.py:33] Completed InferenceExecutorProcess.run in 1585ms
[2024-10-30 20:14:29.346] [info] [metrics.py:33] Completed InferenceExecutorProcess.run in 1545ms
[2024-10-30 20:14:29.412] [info] [metrics.py:33] Completed ClientGenerateBatchProcess.run in 12778ms
[2024-10-30 20:14:29.412] [info] [metrics.py:36] SAMPLES PER SECOND = 0.6260551530808658
[2024-10-30 20:14:29.412] [info] [h11_impl.py:476] 127.0.0.1:46188 - "POST /generate HTTP/1.1" 200

Copy link
Contributor

@stellaraccident stellaraccident left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest upgrading that decorator to take kwargs telling it what to vs sniffing qualname

ret = await fn(*args, **kwargs)
duration_str = get_duration_str(start)
logger.info(f"Completed {fn.__qualname__} in {duration_str}")
if fn.__qualname__ == "ClientGenerateBatchProcess.run":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on with this? Seems like you should just have a kwarg on the decorator to tell it what to do. There's a standard (but tricky) idiom for that using functools.partial...

Copy link
Contributor Author

@monorimet monorimet Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer -- I didn't end up using partial here but adapted to take a few kwargs. There's still some yuck there where we ping attributes of the decorated method's class to figure out batch size, but still more flexible than before. I may revisit later.

@monorimet monorimet merged commit 6c7d4a4 into main Oct 31, 2024
11 checks passed
@monorimet monorimet deleted the sdxl-perf branch October 31, 2024 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants