Skip to content

Commit

Permalink
Updates before public release (#3)
Browse files Browse the repository at this point in the history
* Set GCP integration as "Coming soon"

Still missing some changes / approvals should be ready soon!

* Un-comment `.github/workflows/doc-*.yml`

* Add `digital-ocean.mdx` to `_toctree.yml`

* Add `docs/source/guides/migrate.mdx`

* Update `docs/source/how-to/cloud/digital-ocean.mdx`
  • Loading branch information
alvarobartt authored Oct 23, 2024
1 parent 771f36c commit 0d30424
Show file tree
Hide file tree
Showing 8 changed files with 171 additions and 183 deletions.
44 changes: 22 additions & 22 deletions .github/workflows/doc-build.yml
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
# name: Build Documentation
#
# on:
# push:
# branches:
# - main
# - doc-builder*
# paths:
# - docs/**
# - .github/workflows/doc-build.yml
#
# jobs:
# build:
# uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
# with:
# commit_sha: ${{ github.sha }}
# package: hugs-docs
# package_name: hugs
# additional_args: --not_python_module
# secrets:
# token: ${{ secrets.HUGGINGFACE_PUSH }}
# hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
name: Build Documentation

on:
push:
branches:
- main
- doc-builder*
paths:
- docs/**
- .github/workflows/doc-build.yml

jobs:
build:
uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
with:
commit_sha: ${{ github.sha }}
package: hugs-docs
package_name: hugs
additional_args: --not_python_module
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
42 changes: 21 additions & 21 deletions .github/workflows/doc-pr-build.yml
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
# name: Build PR Documentation
#
# on:
# pull_request:
# paths:
# - docs/**
# - .github/workflows/doc-pr-build.yml
#
# concurrency:
# group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
# cancel-in-progress: true
#
# jobs:
# build:
# uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
# with:
# commit_sha: ${{ github.event.pull_request.head.sha }}
# pr_number: ${{ github.event.number }}
# package: hugs-docs
# package_name: hugs
# additional_args: --not_python_module
name: Build PR Documentation

on:
pull_request:
paths:
- docs/**
- .github/workflows/doc-pr-build.yml

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
build:
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
with:
commit_sha: ${{ github.event.pull_request.head.sha }}
pr_number: ${{ github.event.number }}
package: hugs-docs
package_name: hugs
additional_args: --not_python_module
32 changes: 16 additions & 16 deletions .github/workflows/doc-pr-upload.yml
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# name: Upload PR Documentation
#
# on:
# workflow_run:
# workflows: ["Build PR Documentation"]
# types:
# - completed
#
# jobs:
# build:
# uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@main
# with:
# package_name: hugs
# secrets:
# hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
# comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}
name: Upload PR Documentation

on:
workflow_run:
workflows: ["Build PR Documentation"]
types:
- completed

jobs:
build:
uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@main
with:
package_name: hugs
secrets:
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}
4 changes: 2 additions & 2 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@
- local: how-to/cloud/aws
title: HUGS on AWS
- local: how-to/cloud/gcp
title: HUGS on Google Cloud
title: (Soon) HUGS on Google Cloud
- local: how-to/cloud/azure
title: (Soon) HUGS on Azure
- local: how-to/digitalocean
- local: how-to/cloud/digital-ocean
title: HUGS on DigitalOcean
title: How to run HUGS
- sections:
Expand Down
3 changes: 3 additions & 0 deletions docs/source/guides/migrate.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Migrate from OpenAI to HUGS

Coming soon!
106 changes: 105 additions & 1 deletion docs/source/how-to/cloud/digital-ocean.mdx
Original file line number Diff line number Diff line change
@@ -1,3 +1,107 @@
# HUGS on Digital Ocean

TODO
The Hugging Face Generative AI Services, also known as HUGS, can be deployed in Digital Ocean (DO) via the GPU Droplets as 1-Click Models.

This collaboration brings Hugging Face's extensive library of pre-trained models and their Text Generation Inference (TGI) solution to Digital Ocean customers, enabling seamless integration of state-of-the-art Large Language Models (LLMs) within the GPU Droplets of Digitial Ocean.

HUGS provides access to a hand-picked and manually benchmarked collection of the most performant and latest open LLMs hosted in the Hugging Face Hub to TGI-optimized container applications, allowing users to deploy LLMs with a 1-Click deployment on Digital Ocean GPU Droplets.

With HUGS, developers can easily find, subscribe to, and deploy Hugging Face models using Digital Ocean's infrastructure, leveraging the power of NVIDIA GPUs on optimized, zero-configuration TGI containers.

## 1-Click Deploy of HUGS in DO GPU Droplets

1. Create a Digital Ocean account with a valid payment method, if you don't have one already, and make sure that you have enough quota to spin up GPU Droplets.

2. Go to [Digital Ocean GPU Droplets](https://www.digitalocean.com/products/gpu-droplets) and create a new one.

![Create GPU Droplet on Digital Ocean](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hugs/digital-ocean/create-gpu-droplet.png)

3. Choose a data-center region (New York i.e. NYC2, or Toronto i.e. TOR1, available at the time of writing this).

4. Choose the 1-Click Models when choosing an image, and select any of the available Hugging Face images that correspond to popular LLMs hosted on the Hugging Face Hub.

![Choose 1-Click Models on Digital Ocean](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hugs/digital-ocean/one-click-models.png)

5. Configure the remaining options, and click on "Create GPU Droplet" when done.

### HUGS Inference on DO GPU Droplets

Once the HUGS LLM has been deployed in a DO GPU Droplet, you can either connect to it via the public IP exposed by the instance, or just connect to it via the Web Console.

![HUGS on Digital Ocean GPU Droplet](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hugs/digital-ocean/hugs-gpu-droplet.png)

When connected to the HUGS Droplet, the initial SSH message will display a Bearer Token, which is required to send requests to the public IP of the deployed HUGS Droplet.

Then you can send requests to the Messages API via either `localhost` if connected within the HUGS Droplet, or via its public IP.

<Tip>

In the inference examples in the guide below, the host is assumed to be `localhost`, which is the case when deploying HUGS via GPU Droplet and connecting to the running instance via SSH. If you prefer to use the public IP instead, then you should update that in the examples provided below.

</Tip>

Refer to [Run Inference on HUGS](../../guides/inference.mdx) to see how to run inference on HUGS, but note that in this case you will need to use the Bearer Token provided, so find below the updated examples as in the guide, but using the Bearer Token to send the requests to the Messages API of the deployed HUGS Droplet (assuming that the Bearer Token is stored in the environment variable `export BEARER_TOKEN`).

#### cURL

Using `cURL` is pretty straight forward to [install](https://curl.se/docs/install.html) and use.

```bash
curl http://localhost:8080/v1/chat/completions \
-X POST \
-d '{"messages":[{"role":"user","content":"What is Deep Learning?"}],"temperature":0.7,"top_p":0.95,"max_tokens":128}}' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $BEARER_TOKEN"
```

#### Python

As already mentioned, you can either use the `huggingface_hub.InferenceClient` from the `huggingface_hub` Python SDK (recommended), the `openai` Python SDK, or any SDK with an OpenAI-compatible interface that can consume the Messages API.

##### `huggingface_hub`

You can install it via pip as `pip install --upgrade --quiet huggingface_hub`, and then run the following snippet to mimic the `cURL` commands above i.e. sending requests to the Messages API:

```python
import os
from huggingface_hub import InferenceClient

client = InferenceClient(base_url="http://localhost:8080", api_key=os.getenv("BEARER_TOKEN"))

chat_completion = client.chat.completions.create(
messages=[
{"role":"user","content":"What is Deep Learning?"},
],
temperature=0.7,
top_p=0.95,
max_tokens=128,
)
```

Read more about the [`huggingface_hub.InferenceClient.chat_completion` method](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.AsyncInferenceClient.chat_completion).

##### `openai`

Alternatively, you can also use the Messages API via `openai`; you can install it via `pip as pip install --upgrade openai`, and then run:

```python
import os
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1/", api_key=os.getenv("BEARER_TOKEN"))

chat_completion = client.chat.completions.create(
model="tgi",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Deep Learning?"},
],
temperature=0.7,
top_p=0.95,
max_tokens=128,
)
```

### Delete created DO GPU Droplet

Finally, once you are done using the deployed LLM via the GPU Droplet, you can safely delete it to avoid incurring in unnecessary costs via the "Actions" option within the deployed LLM, and then delete it.
121 changes: 1 addition & 120 deletions docs/source/how-to/cloud/gcp.mdx
Original file line number Diff line number Diff line change
@@ -1,122 +1,3 @@
# HUGS on Google Cloud

The Hugging Face Generative AI Services, also known as HUGS, can be deployed in Google Cloud via the Google Cloud Marketplace offering.

This collaboration brings Hugging Face's extensive library of pre-trained models and their Text Generation Inference (TGI) solution to Google Cloud customers, enabling seamless integration of state-of-the-art Large Language Models (LLMs) within the Google Cloud infrastructure.

HUGS provides access to a hand-picked and manually benchmarked collection of the most performant and latest open LLMs hosted in the Hugging Face Hub to TGI-optimized container applications, allowing users to deploy third-party Kubernetes applications on AWS or on-premises environments.

With HUGS, developers can easily find, subscribe to, and deploy Hugging Face models using AWS infrastructure, leveraging the power of NVIDIA GPUs on optimized, zero-configuration TGI containers.

## Subscribe to HUGS on AWS Marketplace

1. Go to [HUGS Google Cloud Marketplace listing](https://console.cloud.google.com/marketplace/product/huggingface-public/hugs__draft)

![HUGS on Google Cloud Marketplace](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hugs/gcp/hugs-marketplace-listing.png)

2. Subscribe to the product in Google Cloud by following the instructions on the page. At the time of writing (October 2024), the steps are to:

1. Click `Purchase`, then go to the next page.
2. Configure the order by selecting the right plan, billing account, and confirming the terms. Then click `Subscribe`.

![HUGS Configuration on Google Cloud Marketplace](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hugs/gcp/hugs-configuration.png)

3. You should see a "Your order request has been sent to Hugging Face" message. With a button "Go to Product Page". Click on it.

![HUGS Confirmation on Google Cloud Marketplace](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hugs/gcp/hugs-confirmation.png)

<Tip>

To know whether you are subscribed or not, you can either see if the "Purchase" button or "Configure" button is enabled on the product page, meaning that either you or someone else from your organization has already requested access for your account.

</Tip>


## Deploy HUGS on Google Cloud GKE

This example showcases how to deploy a HUGS container and model on Google Cloud GKE.

<Tip>

This example assumes that you have an Google Cloud Account, that you have [installed and setup the Google Cloud CLI](https://cloud.google.com/sdk/docs/install), and that you are logged in into your account with the necessary permissions to subscribe to offerings in the Google Cloud Marketplace, and create and manage IAM permissions and resources such as Google Kubernetes Engine (GKE).

</Tip>

When deploying HUGS on Google Cloud through the UI you can either select an existing GKE cluster or create a new one. If you want to create a new one, you can follow the instructions [here](https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-cluster). Additionally you need to define:

* Namespace: The namespace to deploy the HUGS container and model.
* App Instance Name: The name of the HUGS container.
* Hugs Model Id: Select the model you want to deploy from the Hugging Face Hub. You can find all supported model [here](../models)
* Reporting Service Account: The service account to use for reporting.

![HUGS Deployment Configuration](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hugs/gcp/hugs-deploy.png)

Next you click on `Deploy` and wait for the deployment to finish. This takes around 10-15 minutes.

<Tip>

If you want to better understand the different deployment options you have, e.g. 1x NVIDIA L4 GPU for Meta Llama 3.1 8B Instruct, you can checkout the [supported model matrix](../../models.mdx).

</Tip>


## Create a GPU GKE Cluster for HUGS

To deploy HUGS on Google Cloud, you'll need a GKE cluster with GPU support. Here's a step-by-step guide to create one:

1. Ensure you have the [Google Cloud CLI installed and configured](https://cloud.google.com/sdk/docs/install-sdk).

2. Set up environment variables for your cluster configuration:

```bash
export PROJECT_ID="your-project-id" # Your Google Cloud Project ID which is subscribed to HUGS
export CLUSTER_NAME="hugs-cluster" # The name of the GKE cluster
export LOCATION="us-central1" # The location of the GKE cluster
export MACHINE_TYPE="g2-standard-12" # The machine type of the GKE cluster
export GPU_TYPE="nvidia-l4" # The type of GPU to use
export GPU_COUNT=1 # The number of GPUs to use
```

3. Create the GKE cluster:

```bash
gcloud container clusters create $CLUSTER_NAME \
--project=$PROJECT_ID \
--zone=$LOCATION \
--release-channel=stable \
--cluster-version=1.29 \
--machine-type=$MACHINE_TYPE \
--num-nodes=1 \
--no-enable-autoprovisioning
```

4. Add a GPU node pool to the cluster:

```bash
gcloud container node-pools create gpu-pool \
--cluster=$CLUSTER_NAME \
--zone=$LOCATION \
--machine-type=$MACHINE_TYPE \
--accelerator type=$GPU_TYPE,count=$GPU_COUNT \
--num-nodes=1 \
--enable-autoscaling \
--min-nodes=1 \
--max-nodes=1 \
--spot \
--disk-type=pd-ssd \
--disk-size=100GB
```

5. Configure kubectl to use the new cluster:

```bash
gcloud container clusters get-credentials $CLUSTER_NAME --zone=$LOCATION
```

Your GKE cluster with GPU support is now ready for HUGS deployment. You can proceed to deploy HUGS using the Google Cloud Marketplace as described in the previous section.

<Tip>

For more detailed information on creating and managing GKE clusters, refer to the [official Google Kubernetes Engine documentation](https://cloud.google.com/kubernetes-engine/docs) or [run GPUs in GKE Standard node pools](https://cloud.google.com/kubernetes-engine/docs/how-to/gpus).

</Tip>
Coming soon!
2 changes: 1 addition & 1 deletion docs/source/pricing.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ HUGS (Hugging Face Generative AI Services) offers a on-demand pricing based on t
For deployments on major cloud platforms, HUGS is available through their respective marketplaces:

- **AWS Marketplace**: $1 per hour per container
- **Google Cloud Platform (GCP) Marketplace**: $1 per hour per container
- (Soon) **Google Cloud Platform (GCP) Marketplace**: $1 per hour per container

This pricing model is based on the uptime of each container, allowing you to scale your usage according to your needs.

Expand Down

0 comments on commit 0d30424

Please sign in to comment.