Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

寻求一个可以在一个GPU上部署多个小模型的方案 #2541

Open
RichardFans opened this issue Nov 11, 2024 · 2 comments
Open

寻求一个可以在一个GPU上部署多个小模型的方案 #2541

RichardFans opened this issue Nov 11, 2024 · 2 comments
Milestone

Comments

@RichardFans
Copy link

Feature request / 功能建议

提供一个显卡部署多个模型的功能或最佳实践文档

Motivation / 动机

能够支持众多类型的模型轻松完成部署让xinference在一众类似项目中显得非常突出,极大的方便了有一定算力基础的中小公司部署自主可控的推理服务。但像TTS或ASR类的服务显存需求往往不大,却只能单独占用一个显卡,不能充分利用显卡性能。希望官方能提供一个显卡部署多个模型的功能或最佳实践的范例文档。

Your contribution / 您的贡献

目前了解到的情况:

  • NVIDIA的部分显卡支持MIG技术可以支持显卡共享,但仅限于H100等高端显卡
  • VGPU技术可以将显卡虚拟化,但不清楚如何在xinference的TTS或ASR等特定服务中集成
@XprobeBot XprobeBot added the gpu label Nov 11, 2024
@XprobeBot XprobeBot added this to the v0.16 milestone Nov 11, 2024
Copy link

This issue is stale because it has been open for 7 days with no activity.

@github-actions github-actions bot added the stale label Nov 18, 2024
@Valdanitooooo
Copy link
Contributor

多运行几个 xinference 实例

@github-actions github-actions bot removed the stale label Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants