Skip to content

Commit

Permalink
- [Docs] Updated docs and examples to reflect the changes in 0.11.1
Browse files Browse the repository at this point in the history
… (part 2)
  • Loading branch information
peterschmidt85 committed Aug 31, 2023
1 parent 9b5227f commit a0aa057
Show file tree
Hide file tree
Showing 9 changed files with 122 additions and 82 deletions.
16 changes: 7 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
</h1>

<h3 align="center">
Train and deploy LLM models in multiple clouds
Run LLM workloads across any clouds
</h3>

<p align="center">
Expand All @@ -23,18 +23,16 @@ Train and deploy LLM models in multiple clouds
[![PyPI - License](https://img.shields.io/pypi/l/dstack?style=flat-square&color=blue)](https://github.com/dstackai/dstack/blob/master/LICENSE.md)
</div>

`dstack` is an open-source tool that enables the execution of LLM workloads
across multiple cloud providers – ensuring the best GPU price and availability.
`dstack` is an open-source toolkit for running LLM workloads across any clouds, offering a
cost-efficient and user-friendly interface for training, inference, and development.

Deploy services, run tasks, and provision dev environments
in a cost-effective manner across multiple cloud GPU providers.

## Latest news
## Latest news ✨

- [2023/08] [Fine-tuning with Llama 2](https://dstack.ai/examples/finetuning-llama-2) (Example)
- [2023/08] [An early preview of services](https://dstack.ai/blog/2023/08/07/services-preview) (Release)
- [2023/07] [Port mapping, max duration, and more](https://dstack.ai/blog/2023/07/25/port-mapping-max-duration-and-more) (Release)
- [2023/07] [Serving with vLLM](https://dstack.ai/examples/vllm) (Example)
- [2023/08] [Serving SDXL with FastAPI](https://dstack.ai/examples/stable-diffusion-xl) (Example)
- [2023/07] [Serving LLMS with TGI](https://dstack.ai/examples/text-generation-inference) (Example)
- [2023/07] [Serving LLMS with vLLM](https://dstack.ai/examples/vllm) (Example)

## Installation

Expand Down
4 changes: 2 additions & 2 deletions docs/blog/posts/multiple-clouds.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ categories:
- Releases
---

# Discover GPU across multiple clouds
# Automatic GPU discovery across clouds

__The 0.11 update significantly cuts GPU costs and boosts their availability.__

Expand All @@ -16,7 +16,7 @@ configured cloud providers and regions.

<!-- more -->

## Multiple clouds per project
## Multiple backends per project

Now, `dstack` leverages price data from multiple configured cloud providers and regions to automatically suggest the
most cost-effective options.
Expand Down
53 changes: 45 additions & 8 deletions docs/examples/text-generation-inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,11 @@ Here's the configuration that uses services:

```yaml
type: service
# This configuration deploys a given LLM model as an API

image: ghcr.io/huggingface/text-generation-inference:latest

env:
# (Required) Specify the name of the model
- MODEL_ID=tiiuae/falcon-7b
- MODEL_ID=NousResearch/Llama-2-7b-hf

port: 8000

Expand Down Expand Up @@ -84,11 +82,50 @@ $ curl -X POST --location https://yellow-cat-1.mydomain.com \

</div>

!!! info "Gated models"
To use a model with gated access, ensure configuring either the `HUGGING_FACE_HUB_TOKEN` secret
(using [`dstack secrets`](../docs/reference/cli/secrets.md#dstack-secrets-add)),
or environment variable (with [`--env`](../docs/reference/cli/run.md#ENV) in `dstack run` or
using [`env`](../docs/reference/dstack.yml/service.md#env) in the configuration file).
### Gated models

To use a model with gated access, ensure configuring either the `HUGGING_FACE_HUB_TOKEN` secret
(using [`dstack secrets`](../docs/reference/cli/secrets.md#dstack-secrets-add)),
or environment variable (with [`--env`](../docs/reference/cli/run.md#ENV) in `dstack run` or
using [`env`](../docs/reference/dstack.yml/service.md#env) in the configuration file).

<div class="termy">

```shell
$ dstack run . -f text-generation-inference/serve.dstack.yml --env HUGGING_FACE_HUB_TOKEN=&lt;token&gt; --gpu 24GB
```
</div>

### Memory usage and quantization

An LLM typically requires twice the GPU memory compared to its parameter count. For instance, a model with `13B` parameters
needs around `26GB` of GPU memory. To decrease memory usage and fit the model on a smaller GPU, consider using
quantization, which TGI offers as `bitsandbytes` and `gptq` methods.

Here's an example of the Llama 2 13B model tailored for a `24GB` GPU (A10 or L4):

<div editor-title="text-generation-inference/serve.dstack.yml">

```yaml
type: service

image: ghcr.io/huggingface/text-generation-inference:latest

env:
- MODEL_ID=TheBloke/Llama-2-13B-GPTQ

port: 8000

commands:
- text-generation-launcher --hostname 0.0.0.0 --port 8000 --trust-remote-code --quantize gptq
```
</div>
A similar approach allows running the Llama 2 70B model on an `80GB` GPU (A100).

To calculate the exact GPU memory required for a specific model with different quantization methods, you can use the
[hf-accelerate/memory-model-usage](https://huggingface.co/spaces/hf-accelerate/model-memory-usage) Space.

??? info "Dev environments"

Expand Down
24 changes: 15 additions & 9 deletions docs/examples/vllm.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,10 @@ Here's the configuration that uses services to run an LLM as an OpenAI-compatibl
```yaml
type: service

# (Optional) If not specified, it will use your local version
python: "3.11"

env:
# (Required) Specify the name of the model
- MODEL=facebook/opt-125m
- MODEL=NousResearch/Llama-2-7b-hf

port: 8000

Expand Down Expand Up @@ -75,7 +73,7 @@ Once the service is up, you can query the endpoint:
$ curl -X POST --location https://yellow-cat-1.mydomain.com/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "facebook/opt-125m",
"model": "NousResearch/Llama-2-7b-hf",
"prompt": "San Francisco is a",
"max_tokens": 7,
"temperature": 0
Expand All @@ -84,10 +82,18 @@ $ curl -X POST --location https://yellow-cat-1.mydomain.com/v1/completions \

</div>

!!! info "Gated models"
To use a model with gated access, ensure configuring either the `HUGGING_FACE_HUB_TOKEN` secret
(using [`dstack secrets`](../docs/reference/cli/secrets.md#dstack-secrets-add)),
or environment variable (with [`--env`](../docs/reference/cli/run.md#ENV) in `dstack run` or
using [`env`](../docs/reference/dstack.yml/service.md#env) in the configuration file).
### Gated models

To use a gated-access model from Hugging Face Hub, make sure to set up either the `HUGGING_FACE_HUB_TOKEN` secret
(using [`dstack secrets`](../docs/reference/cli/secrets.md#dstack-secrets-add)),
or environment variable (with [`--env`](../docs/reference/cli/run.md#ENV) in `dstack run` or
using [`env`](../docs/reference/dstack.yml/service.md#env) in the configuration file).

<div class="termy">

```shell
$ dstack run . -f vllm/serve.dstack.yml --env HUGGING_FACE_HUB_TOKEN=&lt;token&gt; --gpu 24GB
```
</div>

[Source code](https://github.com/dstackai/dstack-examples){ .md-button .md-button--github }
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
template: home.html
title: Train and deploy LLM models in multiple clouds
title: Run LLM workloads across any clouds
hide:
- navigation
- toc
Expand Down
42 changes: 21 additions & 21 deletions docs/overrides/examples.html
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ <h2>Examples</h2>
</div>

<div class="tx-landing__highlights_grid">
<a href="finetuning-llama-2">
<a href="/examples/finetuning-llama-2">
<div class="feature-cell">
<div class="feature-icon">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
Expand All @@ -27,63 +27,63 @@ <h3>
</div>
</a>

<a href="stable-diffusion-xl">
<a href="/examples/text-generation-inference">
<div class="feature-cell">
<div class="feature-icon">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
<path d="M17.5 12a1.5 1.5 0 0 1-1.5-1.5A1.5 1.5 0 0 1 17.5 9a1.5 1.5 0 0 1 1.5 1.5 1.5 1.5 0 0 1-1.5 1.5m-3-4A1.5 1.5 0 0 1 13 6.5 1.5 1.5 0 0 1 14.5 5 1.5 1.5 0 0 1 16 6.5 1.5 1.5 0 0 1 14.5 8m-5 0A1.5 1.5 0 0 1 8 6.5 1.5 1.5 0 0 1 9.5 5 1.5 1.5 0 0 1 11 6.5 1.5 1.5 0 0 1 9.5 8m-3 4A1.5 1.5 0 0 1 5 10.5 1.5 1.5 0 0 1 6.5 9 1.5 1.5 0 0 1 8 10.5 1.5 1.5 0 0 1 6.5 12M12 3a9 9 0 0 0-9 9 9 9 0 0 0 9 9 1.5 1.5 0 0 0 1.5-1.5c0-.39-.15-.74-.39-1-.23-.27-.38-.62-.38-1a1.5 1.5 0 0 1 1.5-1.5H16a5 5 0 0 0 5-5c0-4.42-4.03-8-9-8Z"></path>
<path d="M16 9h3l-5 7m-4-7h4l-2 8M5 9h3l2 7m5-12h2l2 3h-3m-5-3h2l1 3h-4M7 4h2L8 7H5m1-5L2 8l10 14L22 8l-4-6H6Z"></path>
</svg>
</div>
<h3>
Serving SDXL with FastAPI
Serving LLMs with TGI
</h3>

<p>
Serving <strong>Stable Diffusion XL</strong> with <strong>FastAPI</strong> to generate
and refine images via a REST endpoint.
Serve open-source LLMs as APIs with optimized performance using <strong>TGI</strong>, an
open-source tool by
Hugging Face.
</p>
</div>
</a>

<a href="vllm">
<a href="/examples/stable-diffusion-xl">
<div class="feature-cell">
<div class="feature-icon">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="-3 -3 27 27">
<path d="m13.13 22.19-1.63-3.83c1.57-.58 3.04-1.36 4.4-2.27l-2.77 6.1M5.64 12.5l-3.83-1.63 6.1-2.77C7 9.46 6.22 10.93 5.64 12.5M21.61 2.39S16.66.269 11 5.93c-2.19 2.19-3.5 4.6-4.35 6.71-.28.75-.09 1.57.46 2.13l2.13 2.12c.55.56 1.37.74 2.12.46A19.1 19.1 0 0 0 18.07 13c5.66-5.66 3.54-10.61 3.54-10.61m-7.07 7.07c-.78-.78-.78-2.05 0-2.83s2.05-.78 2.83 0c.77.78.78 2.05 0 2.83-.78.78-2.05.78-2.83 0m-5.66 7.07-1.41-1.41 1.41 1.41M6.24 22l3.64-3.64c-.34-.09-.67-.24-.97-.45L4.83 22h1.41M2 22h1.41l4.77-4.76-1.42-1.41L2 20.59V22m0-2.83 4.09-4.08c-.21-.3-.36-.62-.45-.97L2 17.76v1.41Z"></path>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
<path d="M17.5 12a1.5 1.5 0 0 1-1.5-1.5A1.5 1.5 0 0 1 17.5 9a1.5 1.5 0 0 1 1.5 1.5 1.5 1.5 0 0 1-1.5 1.5m-3-4A1.5 1.5 0 0 1 13 6.5 1.5 1.5 0 0 1 14.5 5 1.5 1.5 0 0 1 16 6.5 1.5 1.5 0 0 1 14.5 8m-5 0A1.5 1.5 0 0 1 8 6.5 1.5 1.5 0 0 1 9.5 5 1.5 1.5 0 0 1 11 6.5 1.5 1.5 0 0 1 9.5 8m-3 4A1.5 1.5 0 0 1 5 10.5 1.5 1.5 0 0 1 6.5 9 1.5 1.5 0 0 1 8 10.5 1.5 1.5 0 0 1 6.5 12M12 3a9 9 0 0 0-9 9 9 9 0 0 0 9 9 1.5 1.5 0 0 0 1.5-1.5c0-.39-.15-.74-.39-1-.23-.27-.38-.62-.38-1a1.5 1.5 0 0 1 1.5-1.5H16a5 5 0 0 0 5-5c0-4.42-4.03-8-9-8Z"></path>
</svg>
</div>
<h3>
Serving LLMs with vLLM
Serving SDXL with FastAPI
</h3>

<p>
Serve open-source LLMs as OpenAI-compatible APIs with up to 24 times higher throughput using
the
<strong>vLLM</strong> library.
Serving <strong>Stable Diffusion XL</strong> with <strong>FastAPI</strong> to generate
and refine images via a REST endpoint.
</p>
</div>
</a>

<a href="text-generation-inference">
<a href="/examples/vllm">
<div class="feature-cell">
<div class="feature-icon">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
<path d="M16 9h3l-5 7m-4-7h4l-2 8M5 9h3l2 7m5-12h2l2 3h-3m-5-3h2l1 3h-4M7 4h2L8 7H5m1-5L2 8l10 14L22 8l-4-6H6Z"></path>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="-3 -3 27 27">
<path d="m13.13 22.19-1.63-3.83c1.57-.58 3.04-1.36 4.4-2.27l-2.77 6.1M5.64 12.5l-3.83-1.63 6.1-2.77C7 9.46 6.22 10.93 5.64 12.5M21.61 2.39S16.66.269 11 5.93c-2.19 2.19-3.5 4.6-4.35 6.71-.28.75-.09 1.57.46 2.13l2.13 2.12c.55.56 1.37.74 2.12.46A19.1 19.1 0 0 0 18.07 13c5.66-5.66 3.54-10.61 3.54-10.61m-7.07 7.07c-.78-.78-.78-2.05 0-2.83s2.05-.78 2.83 0c.77.78.78 2.05 0 2.83-.78.78-2.05.78-2.83 0m-5.66 7.07-1.41-1.41 1.41 1.41M6.24 22l3.64-3.64c-.34-.09-.67-.24-.97-.45L4.83 22h1.41M2 22h1.41l4.77-4.76-1.42-1.41L2 20.59V22m0-2.83 4.09-4.08c-.21-.3-.36-.62-.45-.97L2 17.76v1.41Z"></path>
</svg>
</div>
<h3>
Serving LLMs with TGI
Serving LLMs with vLLM
</h3>

<p>
Serve open-source LLMs as APIs with optimized performance using <strong>TGI</strong>, an
open-source tool by
Hugging Face.
Serve open-source LLMs as OpenAI-compatible APIs with up to 24 times higher throughput using
the
<strong>vLLM</strong> library.
</p>
</div>
</a>

<a href="llmchat">
<a href="/examples/llmchat">
<div class="feature-cell">
<div class="feature-icon">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
Expand Down
Loading

0 comments on commit a0aa057

Please sign in to comment.