Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs on Model Caching #151

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions docs/get-started/api-keys.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ You can generate an API key with **Read/Write** permission, **Restricted** permi

:::note

Legacy API keys generated before November 11, 2024 have either Read/Write or Read Only access to GraphQL based on what was set for that key. All legacy keys have full access to AI API. To improve security, generate a new key with Restricted permission and select the minimum permission needed for your use case.
Legacy API keys generated before November 11, 2024 have either Read/Write or Read Only access to GraphQL based on what was set for that key. All legacy keys have full access to AI API. To improve security, generate a new key with Restricted permission and select the minimum permission needed for your use case.

:::

Expand All @@ -20,16 +20,18 @@ To create an API key:
1. From the console, select **Settings**.
2. Under **API Keys**, choose **+ Create API Key**.
3. Select the permission. If you choose **Restricted** permission, you can customize access for each API:
- **None**: No access
- (AI API only) **Restricted**: Custom access to specific endpoints. No access is default.
- **Read/Write**: Full access
- **Read Only**: Read access without write access
:::warning
- **None**: No access
- (AI API only) **Restricted**: Custom access to specific endpoints. No access is default.
- **Read/Write**: Full access
- **Read Only**: Read access without write access

Select the minimum permission needed for your use case. Only allow full access to GraphQL when absolutely necessary for automations like creating or managing RunPod resources outside of Serverless endpoints.
:::warning

:::
5. Choose **Create**.
Select the minimum permission needed for your use case. Only allow full access to GraphQL when absolutely necessary for automations like creating or managing RunPod resources outside of Serverless endpoints.

:::

4. Choose **Create**.

:::note

Expand Down
3 changes: 2 additions & 1 deletion docs/hosting/maintenance-and-reliability.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,10 @@ description: "Schedule maintenance with at least one-week notice to minimize dis

## Maintenance

Hosts must currently schedule maintenance at least one week in advance and are able to program immediate maintenance *only* in the case that their server is unrented.
Hosts must currently schedule maintenance at least one week in advance and are able to program immediate maintenance _only_ in the case that their server is unrented.
Users will get email reminders of upcoming maintenance that will occur on their active pods.
Please contact RunPod on Discord or Slack if you are:

- scheduling maintenance on more than a few machines, and/or
- performing operations that could affect user data

Expand Down
198 changes: 99 additions & 99 deletions docs/hosting/partner-requirements.md

Large diffs are not rendered by default.

20 changes: 10 additions & 10 deletions docs/integrations/overview.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
title: Integrations
---
title: Integrations
---

import DocCardList from '@theme/DocCardList';

# Integrations

RunPod integrates with various tools that enable you to automate interactions, such as managing containers with infrastructure-as-code and interacting with serverless endpoints without using the WebUI or API. These integrations provide flexibility and automation for streamlining complex workflows and scaling your operations.

<DocCardList />

import DocCardList from '@theme/DocCardList';

# Integrations

RunPod integrates with various tools that enable you to automate interactions, such as managing containers with infrastructure-as-code and interacting with serverless endpoints without using the WebUI or API. These integrations provide flexibility and automation for streamlining complex workflows and scaling your operations.

<DocCardList />
4 changes: 2 additions & 2 deletions docs/pods/configuration/expose-ports.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ It's crucial to be aware of the following behavior:
- If your service does not respond within 100 seconds of a request, the connection will be closed.
- In such cases, the user will receive a `524` error code.

This timeout limit is particularly important for long-running operations or services that might take more than 100 seconds to respond.
This timeout limit is particularly important for long-running operations or services that might take more than 100 seconds to respond.
Make sure to design your applications with this limitation in mind, potentially implementing progress updates or chunked responses for longer operations.

### Through TCP Public IP
Expand Down Expand Up @@ -79,4 +79,4 @@ In this case, I have requested two symmetrical ports and they ended up being 100
```text
RUNPOD_TCP_PORT_70001=10031
RUNPOD_TCP_PORT_70000=10030
```
```
2 changes: 1 addition & 1 deletion docs/references/troubleshooting/leaked-api-keys.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,4 @@ To disable an API key:
To delete an API key:

1. From the console, select **Settings**.
2. Under **API Keys**, select the trash can icon and select **Revoke Key**.
2. Under **API Keys**, select the trash can icon and select **Revoke Key**.
9 changes: 4 additions & 5 deletions docs/serverless/workers/handlers/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,15 +71,14 @@ def handler(job):
runpod.serverless.start({"handler": handler}) # Required.
```

You must return something as output when your worker is done processing the job.
You must return something as output when your worker is done processing the job.
This can directly be the output, or it can be links to cloud storage where the artifacts are saved.
Keep in mind that the input and output payloads are limited to 2 MB each.

:::note

Keep setup processes and functions outside of your handler function. For example, if you are running models make sure they are loaded into VRAM prior to calling `serverless.start` with your handler function.


<details>
<summary>Example</summary>
<Tabs>
Expand Down Expand Up @@ -125,16 +124,16 @@ def handler(event):
runpod.serverless.start({"handler": handler})
```

</TabItem>
</TabItem>
<TabItem value="cli" label="CLI">

The following is an example of the input command.

```command
python your_handler.py --test_input '{"input": {"prompt": "The quick brown fox jumps"}}'
python your_handler.py --test_input '{"input": {"prompt": "The quick brown fox jumps"}}'
```

</TabItem>
</TabItem>
</Tabs>

</details>
Expand Down
1 change: 0 additions & 1 deletion docs/serverless/workers/vllm/get-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,6 @@ chat_completion = client.chat.completions.create(
print(chat_completion)
```


</TabItem>
<TabItem value="node.js" label="Node.js">

Expand Down
File renamed without changes.
File renamed without changes.
169 changes: 169 additions & 0 deletions docs/storage/model-caching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
---
title: Model caching
description: "Model caching allows you to quickly switch out machine learning models in your code."
sidebar_position: 4
---

Model caching allows you to dynamically load and switch between machine learning models in your applications without rebuilding your contianer images or changing your code.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in "container"

It automatically handles model and dataset downloads and makes them available to your application.

:::note

Model caching currently supports models and data sets from [Hugging Face](https://huggingface.co/).

:::

## Benefits

- Faster Development: Switch models instantly without rebuilding containers
- Better Performance: Optimized cold start times and caching
- Easy Integration: Works with popular ML frameworks like PyTorch and Transformers

You can cache your models for both Pods and Serverless.

## Get started with Serverless

With model caching you can preload the models.
This helps so you don't need to bake the model into your docker image or wait for the Worker to download your model from Hugging Face.

1. Log in to the [RunPod Serverless console](https://www.runpod.io/console/serverless).
2. Select **+ New Endpoint**.
3. Provide the following:
1. Endpoint name.
2. Select your GPU configuration.
3. Configure the number of Workers.
4. (optional) Select **FlashBoot**.
5. (optional) Select a template.
6. Enter the name of your Docker image.
- For example `<username>/<repo>:<tag>`.
7. Specify enough memory for your Docker image.
rachfop marked this conversation as resolved.
Show resolved Hide resolved
4. Add your Hugging Face model.
rachfop marked this conversation as resolved.
Show resolved Hide resolved
1. Add the name of the model you want to use (up to five per endpoint).
2. (optional) Add your Hugging Face API Key for gated or private models.
5. Select **Deploy**.

The Model Cache will automatically download the model and make it available to your code.

## Get started with Pods

With model caching you can preload the models.
This helps so you don't need to bake the model download the models while your Pod is starting up.
RunPod handles all of this for you.

1. Navigate to [Pods](https://www.runpod.io/console/pods) and select **+ Deploy**.
2. Choose between **GPU** and **CPU**.
3. Customize your an instance by setting up the following:
1. (optional) Specify a Network volume.
2. Select an instance type. For example, **A40**.
3. (optional) Provide a template. For example, **RunPod Pytorch**.
4. (GPU only) Specify your compute count.
5. Add your Hugging Face model.
rachfop marked this conversation as resolved.
Show resolved Hide resolved
4. Review your configuration and select **Deploy On-Demand**.

The Model Cache will automatically download the model and make it available to your code on your Pod.

## How to interact with your models

The model path is as followed:

```
/runpod/cache/model/$MODELNAME
```

You can set this path in your code.
For example:

```python
import transformers

# path to your model
model = AutoModel.from_pretrained("/runpod/cache/model/$MODEL_NAME/main")
```

Now when this code executes in a Pod or Serverless example, it will already have the model available to you.

## Environment variables

HuggingFace models and datasets are configured using environment variables.

For public models and datasets, you only need to specify what to download.
For private resources, you must provide authentication credentials.

### Model and dataset selection

- `RUNPOD_HUGGINGFACE_MODEL`\
Specifies which models to download. Accepts a comma-separated list of models in the format `user/model[:branch]`.

- `RUNPOD_HUGGINGFACE_DATASET`\
Specifies which datasets to download. Accepts a comma-separated list of datasets in the format `user/dataset[:branch]`.

### Authentication (Optional)

Both variables must be provided together for private resource access:

- `RUNPOD_HUGGINGFACE_TOKEN`\
Your HuggingFace authentication token.

- `RUNPOD_HUGGINGFACE_USER`\
Your HuggingFace username.

### Basic usage

Download a single model or dataset from the default (`main`) branch:

```bash
# Download a model
RUNPOD_HUGGINGFACE_MODEL="openai/whisper-large"

# Download a dataset
RUNPOD_HUGGINGFACE_DATASET="mozilla-foundation/common_voice_11_0"
```

### Specifying branches

Access specific branches by appending `:branch-name`:

```bash
# Download from a specific branch
RUNPOD_HUGGINGFACE_MODEL="openai/whisper-large:experimental"
```

### Multiple resources

Download multiple models or datasets by separating them with commas:

```bash
# Download multiple models
RUNPOD_HUGGINGFACE_MODEL="openai/whisper-large,google/flan-t5-base"

# Download multiple datasets with different branches
RUNPOD_HUGGINGFACE_DATASET="mozilla-foundation/common_voice_11_0,huggingface/dataset-metrics:dev"
```

### Private resources

Access private resources by providing authentication:

```bash
RUNPOD_HUGGINGFACE_USER="your-username"
RUNPOD_HUGGINGFACE_TOKEN="hf_..."
RUNPOD_HUGGINGFACE_MODEL="your-org/private-model"
```

## Example configurations

```bash
# Single public model
RUNPOD_HUGGINGFACE_MODEL="facebook/opt-350m"

# Multiple models with different branches
RUNPOD_HUGGINGFACE_MODEL="facebook/opt-350m:main, google/flan-t5-base:experimental"

# Private model with authentication
RUNPOD_HUGGINGFACE_USER="your-username"
RUNPOD_HUGGINGFACE_TOKEN="hf_..."
RUNPOD_HUGGINGFACE_MODEL="your-org/private-model"

# Multiple datasets
RUNPOD_HUGGINGFACE_DATASET="mozilla-foundation/common_voice_11_0, huggingface/dataset-metrics"
```
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ sidebar_position: 9
description: "Sync your volume to a cloud provider by clicking 'Cloud Sync' on your My Pods page, then follow provider-specific instructions from the dropdown menu."
---

You can sync your volume to a cloud provider by clicking the **Cloud Sync** option under your **My Pods** page. For detailed instructions on connecting to AWS S3, Google Cloud Storage, Azure, Backblaze, Dropbox, and configuring these services, please refer to this [configuration guide](../configuration/export-data.md).
You can sync your volume to a cloud provider by clicking the **Cloud Sync** option under your **My Pods** page. For detailed instructions on connecting to AWS S3, Google Cloud Storage, Azure, Backblaze, Dropbox, and configuring these services, please refer to this [configuration guide](/pods/configuration/export-data).
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Learn to transfer files to and from RunPod.

## Prerequisites

- If you intend to use `runpodctl`, make sure it's installed on your machine, see [install runpodctl](../../runpodctl/install-runpodctl.md)
- If you intend to use `runpodctl`, make sure it's installed on your machine, see [install runpodctl](/runpodctl/install-runpodctl)

- If you intend to use `scp`, make sure your Pod is configured to use real SSH.
For more information, see [use SSH](/pods/configuration/use-ssh).
Expand All @@ -17,7 +17,7 @@ Learn to transfer files to and from RunPod.

- Note the public IP address and external port from the SSH over exposed TCP command (you'll need these for the SCP/rsync commands).

## Transferring with [runpodctl](../../runpodctl/overview.md#data-transfer)
## Transferring with [runpodctl](/runpodctl/overview#data-transfer)

The RunPod CLI (runpodctl) provides simple commands for transferring data between your machine and RunPod. **It’s preinstalled on all RunPod Pods** and uses one-time codes for secure authentication, so no API keys are required.

Expand Down Expand Up @@ -154,4 +154,4 @@ total size is 119 speedup is 0.90

## Sync a volume to a cloud provider

You can sync your volume to a cloud provider by clicking the **Cloud Sync** option under your **My Pods** page, For detailed instructions on connecting to AWS S3, Google Cloud Storage, Azure, Backblaze, Dropbox, and configuring these services, please refer to this [configuration guide](../configuration/export-data.md).
You can sync your volume to a cloud provider by clicking the **Cloud Sync** option under your **My Pods** page, For detailed instructions on connecting to AWS S3, Google Cloud Storage, Azure, Backblaze, Dropbox, and configuring these services, please refer to this [configuration guide](/pods/configuration/export-data).
File renamed without changes.
10 changes: 10 additions & 0 deletions sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,16 @@ module.exports = {
},
],
},
{
type: "category",
label: "Storage",
items: [
{
type: "autogenerated",
rachfop marked this conversation as resolved.
Show resolved Hide resolved
dirName: "storage",
},
],
},
{
type: "category",
label: "runpodctl",
Expand Down