寻求一个可以在一个GPU上部署多个小模型的方案 #2541

RichardFans · 2024-11-11T13:16:04Z

提供一个显卡部署多个模型的功能或最佳实践文档

能够支持众多类型的模型轻松完成部署让xinference在一众类似项目中显得非常突出，极大的方便了有一定算力基础的中小公司部署自主可控的推理服务。但像TTS或ASR类的服务显存需求往往不大，却只能单独占用一个显卡，不能充分利用显卡性能。希望官方能提供一个显卡部署多个模型的功能或最佳实践的范例文档。

目前了解到的情况：

github-actions · 2024-11-18T19:03:45Z

This issue is stale because it has been open for 7 days with no activity.

Valdanitooooo · 2024-11-19T03:38:46Z

多运行几个 xinference 实例

RichardFans added the feature label Nov 11, 2024

XprobeBot added the gpu label Nov 11, 2024

XprobeBot added this to the v0.16 milestone Nov 11, 2024

github-actions bot added the stale label Nov 18, 2024

github-actions bot removed the stale label Nov 19, 2024

Provide feedback