Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find More General and Easier to use Alternative For Compiling Models for Shortfin LLM Server #402

Open
stbaione opened this issue Oct 31, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@stbaione
Copy link
Contributor

Discussion for this in #373 and #284.

The export script in sharktank was built specifically for llama 3.1 models and has some rough edges. Along with this, it requires users to chain together cli commands: python -m sharktank.examples.export_paged_llm_v1.py [--options], then iree-compile [--options].

It has some rough edges, is a bit cumbersome from a user perspective, and requires CI runs to invoke cli commands via subprocess, instead of having a programmatic in-memory alternative.

We should find a more general and easier to use solution to handle generating mlir for LLM models and compiling those models to .vmfb for shortfin server.

Below is a starting point recommendation provided by @ScottTodd:

"Users shouldn't need to chain together python -m sharktank.examples. and iree-compile ... commands. We can aim for something like https://docs.vllm.ai/en/latest/getting_started/quickstart.html#offline-batched-inference

llm = LLM(model="facebook/opt-125m")
outputs = llm.generate(prompts, sampling_params)

(that's as minimal as it gets - we'll want to pass options like the compilation target though)"

@stbaione stbaione added the enhancement New feature or request label Oct 31, 2024
@stellaraccident
Copy link
Contributor

Hang tight for a bit. More tooling is coming that will make this all one command. Building it out for sdxl first.

iree-org/iree#18630 (review)

@ScottTodd
Copy link
Member

Ah yes, I was just going to connect those dots too.

For SDXL there are multiple submodels (VAE + UNet + CLIP), so having the build system manage all of them is especially helpful. Ideally we can standardize on a similar set of APIs for llama, SDXL, and future supported models.

@stbaione
Copy link
Contributor Author

Closing as something is already in the works for this

@ScottTodd
Copy link
Member

Well we still need code written. Fine to keep this as a tracking issue, blocked on the work happening for SDXL.

@ScottTodd ScottTodd reopened this Oct 31, 2024
@renxida
Copy link
Contributor

renxida commented Nov 4, 2024

Looks like iree.build is merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants