Find More General and Easier to use Alternative For Compiling Models for Shortfin LLM Server #402

stbaione · 2024-10-31T20:58:30Z

Discussion for this in #373 and #284.

The export script in sharktank was built specifically for llama 3.1 models and has some rough edges. Along with this, it requires users to chain together cli commands: python -m sharktank.examples.export_paged_llm_v1.py [--options], then iree-compile [--options].

It has some rough edges, is a bit cumbersome from a user perspective, and requires CI runs to invoke cli commands via subprocess, instead of having a programmatic in-memory alternative.

We should find a more general and easier to use solution to handle generating mlir for LLM models and compiling those models to .vmfb for shortfin server.

Below is a starting point recommendation provided by @ScottTodd:

"Users shouldn't need to chain together python -m sharktank.examples. and iree-compile ... commands. We can aim for something like https://docs.vllm.ai/en/latest/getting_started/quickstart.html#offline-batched-inference

llm = LLM(model="facebook/opt-125m")
outputs = llm.generate(prompts, sampling_params)

(that's as minimal as it gets - we'll want to pass options like the compilation target though)"

The text was updated successfully, but these errors were encountered:

stellaraccident · 2024-10-31T21:09:20Z

Hang tight for a bit. More tooling is coming that will make this all one command. Building it out for sdxl first.

iree-org/iree#18630 (review)

ScottTodd · 2024-10-31T21:11:07Z

Ah yes, I was just going to connect those dots too.

For SDXL there are multiple submodels (VAE + UNet + CLIP), so having the build system manage all of them is especially helpful. Ideally we can standardize on a similar set of APIs for llama, SDXL, and future supported models.

stbaione · 2024-10-31T21:12:35Z

Closing as something is already in the works for this

ScottTodd · 2024-10-31T21:22:40Z

Well we still need code written. Fine to keep this as a tracking issue, blocked on the work happening for SDXL.

renxida · 2024-11-04T18:07:21Z

Looks like iree.build is merged!

stbaione added the enhancement New feature or request label Oct 31, 2024

stbaione mentioned this issue Oct 31, 2024

CPU LLM Integration Test #373

Merged

stbaione closed this as completed Oct 31, 2024

ScottTodd reopened this Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find More General and Easier to use Alternative For Compiling Models for Shortfin LLM Server #402

Find More General and Easier to use Alternative For Compiling Models for Shortfin LLM Server #402

stbaione commented Oct 31, 2024

stellaraccident commented Oct 31, 2024

ScottTodd commented Oct 31, 2024

stbaione commented Oct 31, 2024

ScottTodd commented Oct 31, 2024

renxida commented Nov 4, 2024

Find More General and Easier to use Alternative For Compiling Models for Shortfin LLM Server #402

Find More General and Easier to use Alternative For Compiling Models for Shortfin LLM Server #402

Comments

stbaione commented Oct 31, 2024

stellaraccident commented Oct 31, 2024

ScottTodd commented Oct 31, 2024

stbaione commented Oct 31, 2024

ScottTodd commented Oct 31, 2024

renxida commented Nov 4, 2024