From a0aa057f2a7941006fbc1bc17e51b07765e44920 Mon Sep 17 00:00:00 2001
From: peterschmidt85 <andrey.cheptsov@gmail.com>
Date: Thu, 31 Aug 2023 16:31:31 +0200
Subject: [PATCH] - [Docs] Updated docs and examples to reflect the changes in
 `0.11.1` (part 2)

---
 README.md                                  | 16 +++----
 docs/blog/posts/multiple-clouds.md         |  4 +-
 docs/examples/text-generation-inference.md | 53 ++++++++++++++++----
 docs/examples/vllm.md                      | 24 ++++++----
 docs/index.md                              |  2 +-
 docs/overrides/examples.html               | 42 ++++++++--------
 docs/overrides/home.html                   | 56 +++++++++++-----------
 mkdocs.yml                                 |  4 +-
 setup.py                                   |  3 +-
 9 files changed, 122 insertions(+), 82 deletions(-)
diff --git a/README.md b/README.md
index 413177d89..0ee7b3f78 100644
--- a/README.md
+++ b/README.md
@@ -9,7 +9,7 @@
 </h1>
 
 <h3 align="center">
-Train and deploy LLM models in multiple clouds
+Run LLM workloads across any clouds
 </h3>
 
 <p align="center">
@@ -23,18 +23,16 @@ Train and deploy LLM models in multiple clouds
 [![PyPI - License](https://img.shields.io/pypi/l/dstack?style=flat-square&color=blue)](https://github.com/dstackai/dstack/blob/master/LICENSE.md)
 </div>
 
-`dstack` is an open-source tool that enables the execution of LLM workloads
-across multiple cloud providers – ensuring the best GPU price and availability.
+`dstack` is an open-source toolkit for running LLM workloads across any clouds, offering a
+cost-efficient and user-friendly interface for training, inference, and development.
 
-Deploy services, run tasks, and provision dev environments
-in a cost-effective manner across multiple cloud GPU providers.
-
-## Latest news
+## Latest news ✨
 
 - [2023/08] [Fine-tuning with Llama 2](https://dstack.ai/examples/finetuning-llama-2) (Example)
 - [2023/08] [An early preview of services](https://dstack.ai/blog/2023/08/07/services-preview) (Release)
-- [2023/07] [Port mapping, max duration, and more](https://dstack.ai/blog/2023/07/25/port-mapping-max-duration-and-more) (Release)
-- [2023/07] [Serving with vLLM](https://dstack.ai/examples/vllm) (Example)
+- [2023/08] [Serving SDXL with FastAPI](https://dstack.ai/examples/stable-diffusion-xl) (Example)
+- [2023/07] [Serving LLMS with TGI](https://dstack.ai/examples/text-generation-inference) (Example)
+- [2023/07] [Serving LLMS with vLLM](https://dstack.ai/examples/vllm) (Example)
 
 ## Installation
 
diff --git a/docs/blog/posts/multiple-clouds.md b/docs/blog/posts/multiple-clouds.md
index ffda14189..adb436b75 100644
--- a/docs/blog/posts/multiple-clouds.md
+++ b/docs/blog/posts/multiple-clouds.md
@@ -7,7 +7,7 @@ categories:
 - Releases
 ---
 
-# Discover GPU across multiple clouds
+# Automatic GPU discovery across clouds 
 
 __The 0.11 update significantly cuts GPU costs and boosts their availability.__
 
@@ -16,7 +16,7 @@ configured cloud providers and regions.
 
 <!-- more -->
 
-## Multiple clouds per project
+## Multiple backends per project
 
 Now, `dstack` leverages price data from multiple configured cloud providers and regions to automatically suggest the
 most cost-effective options.
diff --git a/docs/examples/text-generation-inference.md b/docs/examples/text-generation-inference.md
index 8b8b207e6..589461762 100644
--- a/docs/examples/text-generation-inference.md
+++ b/docs/examples/text-generation-inference.md
@@ -31,13 +31,11 @@ Here's the configuration that uses services:
 
 ```yaml
 type: service
-# This configuration deploys a given LLM model as an API
 
 image: ghcr.io/huggingface/text-generation-inference:latest
 
 env:
-  # (Required) Specify the name of the model
-  - MODEL_ID=tiiuae/falcon-7b
+      - MODEL_ID=NousResearch/Llama-2-7b-hf
 
 port: 8000
 
@@ -84,11 +82,50 @@ $ curl -X POST --location https://yellow-cat-1.mydomain.com \
 
 </div>
 
-!!! info "Gated models"
-    To use a model with gated access, ensure configuring either the `HUGGING_FACE_HUB_TOKEN` secret
-    (using [`dstack secrets`](../docs/reference/cli/secrets.md#dstack-secrets-add)),
-    or environment variable (with [`--env`](../docs/reference/cli/run.md#ENV) in `dstack run` or 
-    using [`env`](../docs/reference/dstack.yml/service.md#env) in the configuration file).
+### Gated models
+
+To use a model with gated access, ensure configuring either the `HUGGING_FACE_HUB_TOKEN` secret
+(using [`dstack secrets`](../docs/reference/cli/secrets.md#dstack-secrets-add)),
+or environment variable (with [`--env`](../docs/reference/cli/run.md#ENV) in `dstack run` or 
+using [`env`](../docs/reference/dstack.yml/service.md#env) in the configuration file).
+
+<div class="termy">
+
+```shell
+$ dstack run . -f text-generation-inference/serve.dstack.yml --env HUGGING_FACE_HUB_TOKEN=&lt;token&gt; --gpu 24GB
+```
+</div>
+
+### Memory usage and quantization
+
+An LLM typically requires twice the GPU memory compared to its parameter count. For instance, a model with `13B` parameters
+needs around `26GB` of GPU memory. To decrease memory usage and fit the model on a smaller GPU, consider using
+quantization, which TGI offers as `bitsandbytes` and `gptq` methods. 
+
+Here's an example of the Llama 2 13B model tailored for a `24GB` GPU (A10 or L4):
+
+<div editor-title="text-generation-inference/serve.dstack.yml"> 
+
+```yaml
+type: service
+
+image: ghcr.io/huggingface/text-generation-inference:latest
+
+env:
+  - MODEL_ID=TheBloke/Llama-2-13B-GPTQ
+
+port: 8000
+
+commands: 
+  - text-generation-launcher --hostname 0.0.0.0 --port 8000 --trust-remote-code --quantize gptq
+```
+
+</div>
+
+A similar approach allows running the Llama 2 70B model on an `80GB` GPU (A100).
+
+To calculate the exact GPU memory required for a specific model with different quantization methods, you can use the
+[hf-accelerate/memory-model-usage](https://huggingface.co/spaces/hf-accelerate/model-memory-usage) Space.
 
 ??? info "Dev environments"
 
diff --git a/docs/examples/vllm.md b/docs/examples/vllm.md
index 7a2177abf..fe4f4d5d8 100644
--- a/docs/examples/vllm.md
+++ b/docs/examples/vllm.md
@@ -31,12 +31,10 @@ Here's the configuration that uses services to run an LLM as an OpenAI-compatibl
 ```yaml
 type: service
 
-# (Optional) If not specified, it will use your local version
 python: "3.11"
 
 env:
-  # (Required) Specify the name of the model
-  - MODEL=facebook/opt-125m
+  - MODEL=NousResearch/Llama-2-7b-hf
 
 port: 8000
 
@@ -75,7 +73,7 @@ Once the service is up, you can query the endpoint:
 $ curl -X POST --location https://yellow-cat-1.mydomain.com/v1/completions \
     -H "Content-Type: application/json" \
     -d '{
-          "model": "facebook/opt-125m",
+          "model": "NousResearch/Llama-2-7b-hf",
           "prompt": "San Francisco is a",
           "max_tokens": 7,
           "temperature": 0
@@ -84,10 +82,18 @@ $ curl -X POST --location https://yellow-cat-1.mydomain.com/v1/completions \
 
 </div>
 
-!!! info "Gated models"
-    To use a model with gated access, ensure configuring either the `HUGGING_FACE_HUB_TOKEN` secret
-    (using [`dstack secrets`](../docs/reference/cli/secrets.md#dstack-secrets-add)),
-    or environment variable (with [`--env`](../docs/reference/cli/run.md#ENV) in `dstack run` or 
-    using [`env`](../docs/reference/dstack.yml/service.md#env) in the configuration file).
+### Gated models
+
+To use a gated-access model from Hugging Face Hub, make sure to set up either the `HUGGING_FACE_HUB_TOKEN` secret
+(using [`dstack secrets`](../docs/reference/cli/secrets.md#dstack-secrets-add)),
+or environment variable (with [`--env`](../docs/reference/cli/run.md#ENV) in `dstack run` or 
+using [`env`](../docs/reference/dstack.yml/service.md#env) in the configuration file).
+
+<div class="termy">
+
+```shell
+$ dstack run . -f vllm/serve.dstack.yml --env HUGGING_FACE_HUB_TOKEN=&lt;token&gt; --gpu 24GB
+```
+</div>
 
 [Source code](https://github.com/dstackai/dstack-examples){ .md-button .md-button--github }
\ No newline at end of file
diff --git a/docs/index.md b/docs/index.md
index 133072ed8..725191658 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,6 +1,6 @@
 ---
 template: home.html
-title: Train and deploy LLM models in multiple clouds
+title: Run LLM workloads across any clouds
 hide:
    - navigation
    - toc
diff --git a/docs/overrides/examples.html b/docs/overrides/examples.html
index 389e8c4ef..a739101f0 100644
--- a/docs/overrides/examples.html
+++ b/docs/overrides/examples.html
@@ -9,7 +9,7 @@ <h2>Examples</h2>
             </div>
 
             <div class="tx-landing__highlights_grid">
-                <a href="finetuning-llama-2">
+                <a href="/examples/finetuning-llama-2">
                     <div class="feature-cell">
                         <div class="feature-icon">
                             <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
@@ -27,63 +27,63 @@ <h3>
                     </div>
                 </a>
 
-                <a href="stable-diffusion-xl">
+                <a href="/examples/text-generation-inference">
                     <div class="feature-cell">
                         <div class="feature-icon">
                             <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
-                                <path d="M17.5 12a1.5 1.5 0 0 1-1.5-1.5A1.5 1.5 0 0 1 17.5 9a1.5 1.5 0 0 1 1.5 1.5 1.5 1.5 0 0 1-1.5 1.5m-3-4A1.5 1.5 0 0 1 13 6.5 1.5 1.5 0 0 1 14.5 5 1.5 1.5 0 0 1 16 6.5 1.5 1.5 0 0 1 14.5 8m-5 0A1.5 1.5 0 0 1 8 6.5 1.5 1.5 0 0 1 9.5 5 1.5 1.5 0 0 1 11 6.5 1.5 1.5 0 0 1 9.5 8m-3 4A1.5 1.5 0 0 1 5 10.5 1.5 1.5 0 0 1 6.5 9 1.5 1.5 0 0 1 8 10.5 1.5 1.5 0 0 1 6.5 12M12 3a9 9 0 0 0-9 9 9 9 0 0 0 9 9 1.5 1.5 0 0 0 1.5-1.5c0-.39-.15-.74-.39-1-.23-.27-.38-.62-.38-1a1.5 1.5 0 0 1 1.5-1.5H16a5 5 0 0 0 5-5c0-4.42-4.03-8-9-8Z"></path>
+                                <path d="M16 9h3l-5 7m-4-7h4l-2 8M5 9h3l2 7m5-12h2l2 3h-3m-5-3h2l1 3h-4M7 4h2L8 7H5m1-5L2 8l10 14L22 8l-4-6H6Z"></path>
                             </svg>
                         </div>
                         <h3>
-                            Serving SDXL with FastAPI
+                            Serving LLMs with TGI
                         </h3>
 
                         <p>
-                            Serving <strong>Stable Diffusion XL</strong> with <strong>FastAPI</strong> to generate
-                            and refine images via a REST endpoint.
+                            Serve open-source LLMs as APIs with optimized performance using <strong>TGI</strong>, an
+                            open-source tool by
+                            Hugging Face.
                         </p>
                     </div>
                 </a>
 
-                <a href="vllm">
+                <a href="/examples/stable-diffusion-xl">
                     <div class="feature-cell">
                         <div class="feature-icon">
-                            <svg xmlns="http://www.w3.org/2000/svg" viewBox="-3 -3 27 27">
-                                <path d="m13.13 22.19-1.63-3.83c1.57-.58 3.04-1.36 4.4-2.27l-2.77 6.1M5.64 12.5l-3.83-1.63 6.1-2.77C7 9.46 6.22 10.93 5.64 12.5M21.61 2.39S16.66.269 11 5.93c-2.19 2.19-3.5 4.6-4.35 6.71-.28.75-.09 1.57.46 2.13l2.13 2.12c.55.56 1.37.74 2.12.46A19.1 19.1 0 0 0 18.07 13c5.66-5.66 3.54-10.61 3.54-10.61m-7.07 7.07c-.78-.78-.78-2.05 0-2.83s2.05-.78 2.83 0c.77.78.78 2.05 0 2.83-.78.78-2.05.78-2.83 0m-5.66 7.07-1.41-1.41 1.41 1.41M6.24 22l3.64-3.64c-.34-.09-.67-.24-.97-.45L4.83 22h1.41M2 22h1.41l4.77-4.76-1.42-1.41L2 20.59V22m0-2.83 4.09-4.08c-.21-.3-.36-.62-.45-.97L2 17.76v1.41Z"></path>
+                            <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
+                                <path d="M17.5 12a1.5 1.5 0 0 1-1.5-1.5A1.5 1.5 0 0 1 17.5 9a1.5 1.5 0 0 1 1.5 1.5 1.5 1.5 0 0 1-1.5 1.5m-3-4A1.5 1.5 0 0 1 13 6.5 1.5 1.5 0 0 1 14.5 5 1.5 1.5 0 0 1 16 6.5 1.5 1.5 0 0 1 14.5 8m-5 0A1.5 1.5 0 0 1 8 6.5 1.5 1.5 0 0 1 9.5 5 1.5 1.5 0 0 1 11 6.5 1.5 1.5 0 0 1 9.5 8m-3 4A1.5 1.5 0 0 1 5 10.5 1.5 1.5 0 0 1 6.5 9 1.5 1.5 0 0 1 8 10.5 1.5 1.5 0 0 1 6.5 12M12 3a9 9 0 0 0-9 9 9 9 0 0 0 9 9 1.5 1.5 0 0 0 1.5-1.5c0-.39-.15-.74-.39-1-.23-.27-.38-.62-.38-1a1.5 1.5 0 0 1 1.5-1.5H16a5 5 0 0 0 5-5c0-4.42-4.03-8-9-8Z"></path>
                             </svg>
                         </div>
                         <h3>
-                            Serving LLMs with vLLM
+                            Serving SDXL with FastAPI
                         </h3>
 
                         <p>
-                            Serve open-source LLMs as OpenAI-compatible APIs with up to 24 times higher throughput using
-                            the
-                            <strong>vLLM</strong> library.
+                            Serving <strong>Stable Diffusion XL</strong> with <strong>FastAPI</strong> to generate
+                            and refine images via a REST endpoint.
                         </p>
                     </div>
                 </a>
 
-                <a href="text-generation-inference">
+                <a href="/examples/vllm">
                     <div class="feature-cell">
                         <div class="feature-icon">
-                            <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
-                                <path d="M16 9h3l-5 7m-4-7h4l-2 8M5 9h3l2 7m5-12h2l2 3h-3m-5-3h2l1 3h-4M7 4h2L8 7H5m1-5L2 8l10 14L22 8l-4-6H6Z"></path>
+                            <svg xmlns="http://www.w3.org/2000/svg" viewBox="-3 -3 27 27">
+                                <path d="m13.13 22.19-1.63-3.83c1.57-.58 3.04-1.36 4.4-2.27l-2.77 6.1M5.64 12.5l-3.83-1.63 6.1-2.77C7 9.46 6.22 10.93 5.64 12.5M21.61 2.39S16.66.269 11 5.93c-2.19 2.19-3.5 4.6-4.35 6.71-.28.75-.09 1.57.46 2.13l2.13 2.12c.55.56 1.37.74 2.12.46A19.1 19.1 0 0 0 18.07 13c5.66-5.66 3.54-10.61 3.54-10.61m-7.07 7.07c-.78-.78-.78-2.05 0-2.83s2.05-.78 2.83 0c.77.78.78 2.05 0 2.83-.78.78-2.05.78-2.83 0m-5.66 7.07-1.41-1.41 1.41 1.41M6.24 22l3.64-3.64c-.34-.09-.67-.24-.97-.45L4.83 22h1.41M2 22h1.41l4.77-4.76-1.42-1.41L2 20.59V22m0-2.83 4.09-4.08c-.21-.3-.36-.62-.45-.97L2 17.76v1.41Z"></path>
                             </svg>
                         </div>
                         <h3>
-                            Serving LLMs with TGI
+                            Serving LLMs with vLLM
                         </h3>
 
                         <p>
-                            Serve open-source LLMs as APIs with optimized performance using <strong>TGI</strong>, an
-                            open-source tool by
-                            Hugging Face.
+                            Serve open-source LLMs as OpenAI-compatible APIs with up to 24 times higher throughput using
+                            the
+                            <strong>vLLM</strong> library.
                         </p>
                     </div>
                 </a>
 
-                <a href="llmchat">
+                <a href="/examples/llmchat">
                     <div class="feature-cell">
                         <div class="feature-icon">
                             <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
diff --git a/docs/overrides/home.html b/docs/overrides/home.html
index 531c5bbb0..922990f88 100644
--- a/docs/overrides/home.html
+++ b/docs/overrides/home.html
@@ -125,10 +125,10 @@
     <div class="md-grid md-typeset">
         <div class="tx-landing__hero">
             <div class="tx-landing__hero_text">
-                <h1>Train and deploy LLM models in multiple clouds</h1>
-                <p><strong>dstack</strong> is an open-source tool that enables the execution of LLM workloads
-                    across multiple cloud providers – ensuring the
-                    <strong>best GPU price and availability</strong>.
+                <h1>Run LLM workloads across any clouds</h1>
+                <p>
+                    <strong>dstack</strong> is an open-source toolkit for running LLM workloads across any clouds, offering a
+                    cost-efficient and user-friendly interface for training, inference, and development.
                 </p>
 
                 <!--<p>Deploy run <strong>tasks</strong>, <strong>services</strong>, and provision
@@ -172,7 +172,7 @@ <h2>Training</h2>
                     </p>
 
                     <p>
-                        <a href="https://dstack.ai/docs/guides/tasks" target="_blank"
+                        <a href="/docs/guides/tasks" target="_blank"
                            class="md-button md-button-secondary">Learn more</a>
                     </p>
                 </div>
@@ -199,7 +199,7 @@ <h2>Inference</h2>
                     <p><strong>Services</strong> enable cost-effective deployment of models and web apps.</p>
 
                     <p>
-                        <a href="https://dstack.ai/docs/guides/services" target="_blank"
+                        <a href="/docs/guides/services" target="_blank"
                            class="md-button md-button-secondary">Learn more</a>
                     </p>
                 </div>
@@ -216,7 +216,7 @@ <h2>Dev environments</h2>
                     <p>Dev environments are easily accessible through your local desktop IDE.</p>
 
                     <p>
-                        <a href="https://dstack.ai/docs/guides/dev-environments" target="_blank"
+                        <a href="/docs/guides/dev-environments" target="_blank"
                            class="md-button md-button-secondary">Learn more</a>
                     </p>
                 </div>
@@ -233,7 +233,7 @@ <h2>Featured examples</h2>
             </div>
 
             <div class="tx-landing__highlights_grid">
-                <a href="examples/finetuning-llama-2">
+                <a href="/examples/finetuning-llama-2">
                     <div class="feature-cell">
                         <div class="feature-icon">
                             <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
@@ -252,62 +252,62 @@ <h3>
                     </div>
                 </a>
 
-                <a href="examples/stable-diffusion-xl">
+                <a href="/examples/text-generation-inference">
                     <div class="feature-cell">
                         <div class="feature-icon">
                             <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
-                                <path d="M17.5 12a1.5 1.5 0 0 1-1.5-1.5A1.5 1.5 0 0 1 17.5 9a1.5 1.5 0 0 1 1.5 1.5 1.5 1.5 0 0 1-1.5 1.5m-3-4A1.5 1.5 0 0 1 13 6.5 1.5 1.5 0 0 1 14.5 5 1.5 1.5 0 0 1 16 6.5 1.5 1.5 0 0 1 14.5 8m-5 0A1.5 1.5 0 0 1 8 6.5 1.5 1.5 0 0 1 9.5 5 1.5 1.5 0 0 1 11 6.5 1.5 1.5 0 0 1 9.5 8m-3 4A1.5 1.5 0 0 1 5 10.5 1.5 1.5 0 0 1 6.5 9 1.5 1.5 0 0 1 8 10.5 1.5 1.5 0 0 1 6.5 12M12 3a9 9 0 0 0-9 9 9 9 0 0 0 9 9 1.5 1.5 0 0 0 1.5-1.5c0-.39-.15-.74-.39-1-.23-.27-.38-.62-.38-1a1.5 1.5 0 0 1 1.5-1.5H16a5 5 0 0 0 5-5c0-4.42-4.03-8-9-8Z"></path>
+                                <path d="M16 9h3l-5 7m-4-7h4l-2 8M5 9h3l2 7m5-12h2l2 3h-3m-5-3h2l1 3h-4M7 4h2L8 7H5m1-5L2 8l10 14L22 8l-4-6H6Z"></path>
                             </svg>
                         </div>
                         <h3>
-                            Serving SDXL with FastAPI
+                            Serving LLMs with TGI
                         </h3>
 
                         <p>
-                            Serving <strong>Stable Diffusion XL</strong> with <strong>FastAPI</strong> to generate
-                            and refine images via a REST endpoint.
+                            Serve open-source LLMs as APIs with optimized performance using <strong>TGI</strong>, an
+                            open-source tool by
+                            Hugging Face.
                         </p>
                     </div>
                 </a>
 
-                <a href="examples/vllm">
+                <a href="/examples/stable-diffusion-xl">
                     <div class="feature-cell">
                         <div class="feature-icon">
-                            <svg xmlns="http://www.w3.org/2000/svg" viewBox="-3 -3 27 27">
-                                <path d="m13.13 22.19-1.63-3.83c1.57-.58 3.04-1.36 4.4-2.27l-2.77 6.1M5.64 12.5l-3.83-1.63 6.1-2.77C7 9.46 6.22 10.93 5.64 12.5M21.61 2.39S16.66.269 11 5.93c-2.19 2.19-3.5 4.6-4.35 6.71-.28.75-.09 1.57.46 2.13l2.13 2.12c.55.56 1.37.74 2.12.46A19.1 19.1 0 0 0 18.07 13c5.66-5.66 3.54-10.61 3.54-10.61m-7.07 7.07c-.78-.78-.78-2.05 0-2.83s2.05-.78 2.83 0c.77.78.78 2.05 0 2.83-.78.78-2.05.78-2.83 0m-5.66 7.07-1.41-1.41 1.41 1.41M6.24 22l3.64-3.64c-.34-.09-.67-.24-.97-.45L4.83 22h1.41M2 22h1.41l4.77-4.76-1.42-1.41L2 20.59V22m0-2.83 4.09-4.08c-.21-.3-.36-.62-.45-.97L2 17.76v1.41Z"></path>
+                            <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
+                                <path d="M17.5 12a1.5 1.5 0 0 1-1.5-1.5A1.5 1.5 0 0 1 17.5 9a1.5 1.5 0 0 1 1.5 1.5 1.5 1.5 0 0 1-1.5 1.5m-3-4A1.5 1.5 0 0 1 13 6.5 1.5 1.5 0 0 1 14.5 5 1.5 1.5 0 0 1 16 6.5 1.5 1.5 0 0 1 14.5 8m-5 0A1.5 1.5 0 0 1 8 6.5 1.5 1.5 0 0 1 9.5 5 1.5 1.5 0 0 1 11 6.5 1.5 1.5 0 0 1 9.5 8m-3 4A1.5 1.5 0 0 1 5 10.5 1.5 1.5 0 0 1 6.5 9 1.5 1.5 0 0 1 8 10.5 1.5 1.5 0 0 1 6.5 12M12 3a9 9 0 0 0-9 9 9 9 0 0 0 9 9 1.5 1.5 0 0 0 1.5-1.5c0-.39-.15-.74-.39-1-.23-.27-.38-.62-.38-1a1.5 1.5 0 0 1 1.5-1.5H16a5 5 0 0 0 5-5c0-4.42-4.03-8-9-8Z"></path>
                             </svg>
                         </div>
                         <h3>
-                            Serving LLMs with vLLM
+                            Serving SDXL with FastAPI
                         </h3>
 
                         <p>
-                            Serve open-source LLMs as OpenAI-compatible APIs with up to 24 times higher throughput using
-                            the <strong>vLLM</strong> library.
+                            Serving <strong>Stable Diffusion XL</strong> with <strong>FastAPI</strong> to generate
+                            and refine images via a REST endpoint.
                         </p>
                     </div>
                 </a>
 
-                <a href="examples/text-generation-inference">
+                <a href="/examples/vllm">
                     <div class="feature-cell">
                         <div class="feature-icon">
-                            <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
-                                <path d="M16 9h3l-5 7m-4-7h4l-2 8M5 9h3l2 7m5-12h2l2 3h-3m-5-3h2l1 3h-4M7 4h2L8 7H5m1-5L2 8l10 14L22 8l-4-6H6Z"></path>
+                            <svg xmlns="http://www.w3.org/2000/svg" viewBox="-3 -3 27 27">
+                                <path d="m13.13 22.19-1.63-3.83c1.57-.58 3.04-1.36 4.4-2.27l-2.77 6.1M5.64 12.5l-3.83-1.63 6.1-2.77C7 9.46 6.22 10.93 5.64 12.5M21.61 2.39S16.66.269 11 5.93c-2.19 2.19-3.5 4.6-4.35 6.71-.28.75-.09 1.57.46 2.13l2.13 2.12c.55.56 1.37.74 2.12.46A19.1 19.1 0 0 0 18.07 13c5.66-5.66 3.54-10.61 3.54-10.61m-7.07 7.07c-.78-.78-.78-2.05 0-2.83s2.05-.78 2.83 0c.77.78.78 2.05 0 2.83-.78.78-2.05.78-2.83 0m-5.66 7.07-1.41-1.41 1.41 1.41M6.24 22l3.64-3.64c-.34-.09-.67-.24-.97-.45L4.83 22h1.41M2 22h1.41l4.77-4.76-1.42-1.41L2 20.59V22m0-2.83 4.09-4.08c-.21-.3-.36-.62-.45-.97L2 17.76v1.41Z"></path>
                             </svg>
                         </div>
                         <h3>
-                            Serving LLMs with TGI
+                            Serving LLMs with vLLM
                         </h3>
 
                         <p>
-                            Serve open-source LLMs as APIs with optimized performance using <strong>TGI</strong>, an
-                            open-source tool by
-                            Hugging Face.
+                            Serve open-source LLMs as OpenAI-compatible APIs with up to 24 times higher throughput using
+                            the <strong>vLLM</strong> library.
                         </p>
                     </div>
                 </a>
 
-                <a href="examples/llmchat">
+                <a href="/examples/llmchat">
                     <div class="feature-cell">
                         <div class="feature-icon">
                             <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
@@ -339,7 +339,7 @@ <h2>Get started in less than a minute</h2>
                     </pre>
                 </div>
                 <p class="tx-landing__bottom_cta_text">
-                    Done! Configure cloud credentials and start training and deploying LLM models.
+                    <strong>Done!</strong> Configure clouds and start running LLM workloads.
                 </p>
 
                 <a href="/docs" class="md-button md-button--primary">
diff --git a/mkdocs.yml b/mkdocs.yml
index bb2e85a9e..ca7fe6e7d 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -3,7 +3,7 @@ site_name: dstack
 site_url: https://dstack.ai
 site_author: dstack GmbH
 site_description: >-
-  dstack is an open-source tool that streamlines LLM development and deployment across multiple clouds.
+  dstack is an open-source toolkit for running LLM workloads across any clouds.
 
 # Repository
 repo_url: https://github.com/dstackai/dstack
@@ -180,9 +180,9 @@ nav:
     - examples/index.md
     - Examples:
       - Fine-tuning Llama 2: examples/finetuning-llama-2.md
+      - Serving LLMs with TGI: examples/text-generation-inference.md
       - Serving SDXL with FastAPI: examples/stable-diffusion-xl.md
       - Serving LLMs with vLLM: examples/vllm.md
-      - Serving LLMs with TGI: examples/text-generation-inference.md
       - LLM as Chatbot: examples/llmchat.md
   - Blog:
       - blog/index.md
diff --git a/setup.py b/setup.py
index 943308ffd..057c52a25 100644
--- a/setup.py
+++ b/setup.py
@@ -208,8 +208,7 @@ def run(self):
     project_urls={
         "Source": "https://github.com/dstackai/dstack",
     },
-    description="dstack is an open-source tool that enables the execution of LLM workloads across multiple cloud GPU "
-    "providers.",
+    description="dstack is an open-source toolkit for running LLM workloads across any clouds.",
     long_description=get_long_description(),
     long_description_content_type="text/markdown",
     python_requires=">=3.7",