runpod · rachfop · Nov 13, 2024 · zhl146 · Dec 16, 2024
diff --git a/docs/get-started/api-keys.md b/docs/get-started/api-keys.md
@@ -9,7 +9,7 @@ You can generate an API key with **Read/Write** permission, **Restricted** permi
 
 :::note
 
-Legacy API keys generated before November 11, 2024 have either Read/Write or Read Only access to GraphQL based on what was set for that key. All legacy keys have full access to AI API. To improve security, generate a new key with Restricted permission and select the minimum permission needed for your use case. 
+Legacy API keys generated before November 11, 2024 have either Read/Write or Read Only access to GraphQL based on what was set for that key. All legacy keys have full access to AI API. To improve security, generate a new key with Restricted permission and select the minimum permission needed for your use case.
 
 :::
 
@@ -20,16 +20,18 @@ To create an API key:
 1. From the console, select **Settings**.
 2. Under **API Keys**, choose **+ Create API Key**.
 3. Select the permission. If you choose **Restricted** permission, you can customize access for each API:
-    - **None**: No access
-    - (AI API only) **Restricted**: Custom access to specific endpoints. No access is default.
-    - **Read/Write**: Full access 
-    - **Read Only**: Read access without write access 
-    :::warning
+   - **None**: No access
+   - (AI API only) **Restricted**: Custom access to specific endpoints. No access is default.
+   - **Read/Write**: Full access
+   - **Read Only**: Read access without write access
 
-    Select the minimum permission needed for your use case. Only allow full access to GraphQL when absolutely necessary for automations like creating or managing RunPod resources outside of Serverless endpoints. 
+:::warning
 
-    :::
-5. Choose **Create**.
+Select the minimum permission needed for your use case. Only allow full access to GraphQL when absolutely necessary for automations like creating or managing RunPod resources outside of Serverless endpoints.
+
+:::
+
+4. Choose **Create**.
 
 :::note
 

diff --git a/docs/hosting/maintenance-and-reliability.md b/docs/hosting/maintenance-and-reliability.md
@@ -5,9 +5,10 @@ description: "Schedule maintenance with at least one-week notice to minimize dis
 
 ## Maintenance
 
-Hosts must currently schedule maintenance at least one week in advance and are able to program immediate maintenance *only* in the case that their server is unrented.
+Hosts must currently schedule maintenance at least one week in advance and are able to program immediate maintenance _only_ in the case that their server is unrented.
 Users will get email reminders of upcoming maintenance that will occur on their active pods.
 Please contact RunPod on Discord or Slack if you are:
+
 - scheduling maintenance on more than a few machines, and/or
 - performing operations that could affect user data
 

diff --git a/docs/hosting/partner-requirements.md b/docs/hosting/partner-requirements.md
diff --git a/docs/integrations/overview.md b/docs/integrations/overview.md
@@ -1,11 +1,11 @@
+---
+title: Integrations
 ---
-title: Integrations
----
-
-import DocCardList from '@theme/DocCardList';
-
-# Integrations
-
-RunPod integrates with various tools that enable you to automate interactions, such as managing containers with infrastructure-as-code and interacting with serverless endpoints without using the WebUI or API. These integrations provide flexibility and automation for streamlining complex workflows and scaling your operations.
-
-<DocCardList />
+
+import DocCardList from '@theme/DocCardList';
+
+# Integrations
+
+RunPod integrates with various tools that enable you to automate interactions, such as managing containers with infrastructure-as-code and interacting with serverless endpoints without using the WebUI or API. These integrations provide flexibility and automation for streamlining complex workflows and scaling your operations.
+
+<DocCardList />
diff --git a/docs/pods/configuration/expose-ports.md b/docs/pods/configuration/expose-ports.md
@@ -42,7 +42,7 @@ It's crucial to be aware of the following behavior:
 - If your service does not respond within 100 seconds of a request, the connection will be closed.
 - In such cases, the user will receive a `524` error code.
 
-This timeout limit is particularly important for long-running operations or services that might take more than 100 seconds to respond. 
+This timeout limit is particularly important for long-running operations or services that might take more than 100 seconds to respond.
 Make sure to design your applications with this limitation in mind, potentially implementing progress updates or chunked responses for longer operations.
 
 ### Through TCP Public IP
@@ -79,4 +79,4 @@ In this case, I have requested two symmetrical ports and they ended up being 100
 ```text
 RUNPOD_TCP_PORT_70001=10031
 RUNPOD_TCP_PORT_70000=10030
-```
+```
diff --git a/docs/references/troubleshooting/leaked-api-keys.md b/docs/references/troubleshooting/leaked-api-keys.md
@@ -18,4 +18,4 @@ To disable an API key:
 To delete an API key:
 
 1. From the console, select **Settings**.
-2. Under **API Keys**, select the trash can icon and select **Revoke Key**.
+2. Under **API Keys**, select the trash can icon and select **Revoke Key**.
diff --git a/docs/serverless/workers/handlers/overview.md b/docs/serverless/workers/handlers/overview.md
@@ -71,15 +71,14 @@ def handler(job):
 runpod.serverless.start({"handler": handler})  # Required.
 ```
 
-You must return something as output when your worker is done processing the job. 
+You must return something as output when your worker is done processing the job.
 This can directly be the output, or it can be links to cloud storage where the artifacts are saved.
 Keep in mind that the input and output payloads are limited to 2 MB each.
 
 :::note
 
 Keep setup processes and functions outside of your handler function. For example, if you are running models make sure they are loaded into VRAM prior to calling `serverless.start` with your handler function.
 
-
 <details>
   <summary>Example</summary>
 <Tabs>
@@ -125,16 +124,16 @@ def handler(event):
 runpod.serverless.start({"handler": handler})
 ```
 
-  </TabItem>
+</TabItem>
   <TabItem value="cli" label="CLI">
 
 The following is an example of the input command.
 
 ```command
- python your_handler.py --test_input '{"input": {"prompt": "The quick brown fox jumps"}}'
+python your_handler.py --test_input '{"input": {"prompt": "The quick brown fox jumps"}}'
 ```
 
-   </TabItem>
+</TabItem>
 </Tabs>
 
 </details>

diff --git a/docs/serverless/workers/vllm/get-started.md b/docs/serverless/workers/vllm/get-started.md
@@ -142,7 +142,6 @@ chat_completion = client.chat.completions.create(
 print(chat_completion)
 ```
 
-
 </TabItem>
   <TabItem value="node.js" label="Node.js">
 

diff --git a/docs/pods/storage/_category_.json → docs/storage/_category_.json b/docs/pods/storage/_category_.json → docs/storage/_category_.json
diff --git a/docs/pods/storage/_volume.md → docs/storage/_volume.md b/docs/pods/storage/_volume.md → docs/storage/_volume.md
diff --git a/docs/pods/storage/create-network-volumes.md → docs/storage/create-network-volumes.md b/docs/pods/storage/create-network-volumes.md → docs/storage/create-network-volumes.md
diff --git a/docs/storage/model-caching.md b/docs/storage/model-caching.md
@@ -0,0 +1,169 @@
+---
+title: Model caching
+description: "Model caching allows you to quickly switch out machine learning models in your code."
+sidebar_position: 4
+---
+
+Model caching allows you to dynamically load and switch between machine learning models in your applications without rebuilding your contianer images or changing your code.
+It automatically handles model and dataset downloads and makes them available to your application.
+
+:::note
+
+Model caching currently supports models and data sets from [Hugging Face](https://huggingface.co/).
+
+:::
+
+## Benefits
+
+- Faster Development: Switch models instantly without rebuilding containers
+- Better Performance: Optimized cold start times and caching
+- Easy Integration: Works with popular ML frameworks like PyTorch and Transformers
+
+You can cache your models for both Pods and Serverless.
+
+## Get started with Serverless
+
+With model caching you can preload the models.
+This helps so you don't need to bake the model into your docker image or wait for the Worker to download your model from Hugging Face.
+
+1. Log in to the [RunPod Serverless console](https://www.runpod.io/console/serverless).
+2. Select **+ New Endpoint**.
+3. Provide the following:
+   1. Endpoint name.
+   2. Select your GPU configuration.
+   3. Configure the number of Workers.
+   4. (optional) Select **FlashBoot**.
+   5. (optional) Select a template.
+   6. Enter the name of your Docker image.
+      - For example `<username>/<repo>:<tag>`.
+   7. Specify enough memory for your Docker image.
+4. Add your Hugging Face model.
+   1. Add the name of the model you want to use (up to five per endpoint).
+   2. (optional) Add your Hugging Face API Key for gated or private models.
+5. Select **Deploy**.
+
+The Model Cache will automatically download the model and make it available to your code.
+
+## Get started with Pods
+
+With model caching you can preload the models.
+This helps so you don't need to bake the model download the models while your Pod is starting up.
+RunPod handles all of this for you.
+
+1. Navigate to [Pods](https://www.runpod.io/console/pods) and select **+ Deploy**.
+2. Choose between **GPU** and **CPU**.
+3. Customize your an instance by setting up the following:
+   1. (optional) Specify a Network volume.
+   2. Select an instance type. For example, **A40**.
+   3. (optional) Provide a template. For example, **RunPod Pytorch**.
+   4. (GPU only) Specify your compute count.
+   5. Add your Hugging Face model.
+4. Review your configuration and select **Deploy On-Demand**.
+
+The Model Cache will automatically download the model and make it available to your code on your Pod.
+
+## How to interact with your models
+
+The model path is as followed:
+
+```
+/runpod/cache/model/$MODELNAME
+```
+
+You can set this path in your code.
+For example:
+
+```python
+import transformers
+
+# path to your model
+model = AutoModel.from_pretrained("/runpod/cache/model/$MODEL_NAME/main")
+```
+
+Now when this code executes in a Pod or Serverless example, it will already have the model available to you.
+
+## Environment variables
+
+HuggingFace models and datasets are configured using environment variables.
+
+For public models and datasets, you only need to specify what to download.
+For private resources, you must provide authentication credentials.
+
+### Model and dataset selection
+
+- `RUNPOD_HUGGINGFACE_MODEL`\
+  Specifies which models to download. Accepts a comma-separated list of models in the format `user/model[:branch]`.
+
+- `RUNPOD_HUGGINGFACE_DATASET`\
+  Specifies which datasets to download. Accepts a comma-separated list of datasets in the format `user/dataset[:branch]`.
+
+### Authentication (Optional)
+
+Both variables must be provided together for private resource access:
+
+- `RUNPOD_HUGGINGFACE_TOKEN`\
+  Your HuggingFace authentication token.
+
+- `RUNPOD_HUGGINGFACE_USER`\
+  Your HuggingFace username.
+
+### Basic usage
+
+Download a single model or dataset from the default (`main`) branch:
+
+```bash
+# Download a model
+RUNPOD_HUGGINGFACE_MODEL="openai/whisper-large"
+
+# Download a dataset
+RUNPOD_HUGGINGFACE_DATASET="mozilla-foundation/common_voice_11_0"
+```
+
+### Specifying branches
+
+Access specific branches by appending `:branch-name`:
+
+```bash
+# Download from a specific branch
+RUNPOD_HUGGINGFACE_MODEL="openai/whisper-large:experimental"
+```
+
+### Multiple resources
+
+Download multiple models or datasets by separating them with commas:
+
+```bash
+# Download multiple models
+RUNPOD_HUGGINGFACE_MODEL="openai/whisper-large,google/flan-t5-base"
+
+# Download multiple datasets with different branches
+RUNPOD_HUGGINGFACE_DATASET="mozilla-foundation/common_voice_11_0,huggingface/dataset-metrics:dev"
+```
+
+### Private resources
+
+Access private resources by providing authentication:
+
+```bash
+RUNPOD_HUGGINGFACE_USER="your-username"
+RUNPOD_HUGGINGFACE_TOKEN="hf_..."
+RUNPOD_HUGGINGFACE_MODEL="your-org/private-model"
+```
+
+## Example configurations
+
+```bash
+# Single public model
+RUNPOD_HUGGINGFACE_MODEL="facebook/opt-350m"
+
+# Multiple models with different branches
+RUNPOD_HUGGINGFACE_MODEL="facebook/opt-350m:main, google/flan-t5-base:experimental"
+
+# Private model with authentication
+RUNPOD_HUGGINGFACE_USER="your-username"
+RUNPOD_HUGGINGFACE_TOKEN="hf_..."
+RUNPOD_HUGGINGFACE_MODEL="your-org/private-model"
+
+# Multiple datasets
+RUNPOD_HUGGINGFACE_DATASET="mozilla-foundation/common_voice_11_0, huggingface/dataset-metrics"
+```
diff --git a/docs/pods/storage/sync-volumes.md → docs/storage/sync-volumes.md b/docs/pods/storage/sync-volumes.md → docs/storage/sync-volumes.md
@@ -4,4 +4,4 @@ sidebar_position: 9
 description: "Sync your volume to a cloud provider by clicking 'Cloud Sync' on your My Pods page, then follow provider-specific instructions from the dropdown menu."
 ---
 
-You can sync your volume to a cloud provider by clicking the **Cloud Sync** option under your **My Pods** page. For detailed instructions on connecting to AWS S3, Google Cloud Storage, Azure, Backblaze, Dropbox, and configuring these services, please refer to this [configuration guide](../configuration/export-data.md).
+You can sync your volume to a cloud provider by clicking the **Cloud Sync** option under your **My Pods** page. For detailed instructions on connecting to AWS S3, Google Cloud Storage, Azure, Backblaze, Dropbox, and configuring these services, please refer to this [configuration guide](/pods/configuration/export-data).
diff --git a/docs/pods/storage/transfer-files.md → docs/storage/transfer-files.md b/docs/pods/storage/transfer-files.md → docs/storage/transfer-files.md
@@ -8,7 +8,7 @@ Learn to transfer files to and from RunPod.
 
 ## Prerequisites
 
-- If you intend to use `runpodctl`, make sure it's installed on your machine, see [install runpodctl](../../runpodctl/install-runpodctl.md)
+- If you intend to use `runpodctl`, make sure it's installed on your machine, see [install runpodctl](/runpodctl/install-runpodctl)
 
 - If you intend to use `scp`, make sure your Pod is configured to use real SSH.
   For more information, see [use SSH](/pods/configuration/use-ssh).
@@ -17,7 +17,7 @@ Learn to transfer files to and from RunPod.
 
 - Note the public IP address and external port from the SSH over exposed TCP command (you'll need these for the SCP/rsync commands).
 
-## Transferring with [runpodctl](../../runpodctl/overview.md#data-transfer)
+## Transferring with [runpodctl](/runpodctl/overview#data-transfer)
 
 The RunPod CLI (runpodctl) provides simple commands for transferring data between your machine and RunPod. **It’s preinstalled on all RunPod Pods** and uses one-time codes for secure authentication, so no API keys are required.
 
@@ -154,4 +154,4 @@ total size is 119  speedup is 0.90
 
 ## Sync a volume to a cloud provider
 
-You can sync your volume to a cloud provider by clicking the **Cloud Sync** option under your **My Pods** page, For detailed instructions on connecting to AWS S3, Google Cloud Storage, Azure, Backblaze, Dropbox, and configuring these services, please refer to this [configuration guide](../configuration/export-data.md).
+You can sync your volume to a cloud provider by clicking the **Cloud Sync** option under your **My Pods** page, For detailed instructions on connecting to AWS S3, Google Cloud Storage, Azure, Backblaze, Dropbox, and configuring these services, please refer to this [configuration guide](/pods/configuration/export-data).
diff --git a/docs/pods/storage/types.md → docs/storage/types.md b/docs/pods/storage/types.md → docs/storage/types.md
diff --git a/sidebars.js b/sidebars.js
@@ -35,6 +35,16 @@ module.exports = {
         },
       ],
     },
+    {
+      type: "category",
+      label: "Storage",
+      items: [
+        {
+          type: "autogenerated",
+          dirName: "storage",
+        },
+      ],
+    },
     {
       type: "category",
       label: "runpodctl",
-Original file line number
+Diff line change
@@ Expand Up / @@ -142,7 +142,6 @@ chat_completion = client.chat.completions.create( @@
     print(chat_completion)
     ```
     </TabItem>
       <TabItem value="node.js" label="Node.js">
@@ Expand Down @@