You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, sorry if this question has been answered before.
I tested guidance using local llamaCpp models, getting a pretty decent result even with 1.5B or 3B models.
Now I would like to apply guidance on my production service, but tests with online "optimistic" models is worse even with much better models.
As I only have a CPU I'm willing to use an API with full guidance support.
All I can found is the AzureGuidance class, but didn't found documentation about it (and how to subscribe to it), also didn't know if there are other althernatives to run it on a server. I cound't find if it supports Huggingface inference API.
Thanks for this amazing job!
The text was updated successfully, but these errors were encountered:
tejonaco
changed the title
What are the best option to run guidance using a web service?
What are the best options to run guidance using a web service?
Jan 21, 2025
Hi Tejonaco, thanks for reporting this! Right now the preferred method for a hosted API is via AzureGuidance. Unfortuantely the support there is still experimental, and currently down. We're working on upgrading our infrastructure there now, and will be able to update you on this shortly. Out of curiosity, are you focused on a specific model or just want a general hosted solution with most capable models (e.g. llama or phi series)? I can reply back here with an update once we're back online with Azure.
If you're willing to self-host on a cloud VM, we have the llgtrt package which builds on TensorRT-LLM for high performance deployments on Nvidia GPUs. But admittedly this takes some effort to set up, so we're also working on more "out of the box" integration options (including open PRs on llama.cpp for native integration, vLLM, etc.). If you do want to give this a try, however, we're definitely happy to help.
Hi, sorry if this question has been answered before.
I tested guidance using local llamaCpp models, getting a pretty decent result even with 1.5B or 3B models.
Now I would like to apply guidance on my production service, but tests with online "optimistic" models is worse even with much better models.
As I only have a CPU I'm willing to use an API with full guidance support.
All I can found is the AzureGuidance class, but didn't found documentation about it (and how to subscribe to it), also didn't know if there are other althernatives to run it on a server. I cound't find if it supports Huggingface inference API.
Thanks for this amazing job!
The text was updated successfully, but these errors were encountered: