From 2096b229f7596c95de4371a16f487a0ad4a6028d Mon Sep 17 00:00:00 2001
From: Patrick <prachford@icloud.com>
Date: Wed, 13 Nov 2024 13:38:20 -0800
Subject: [PATCH] Docs on Model Caching

---
 docs/get-started/api-keys.md                  |  20 +-
 docs/hosting/maintenance-and-reliability.md   |   3 +-
 docs/hosting/partner-requirements.md          | 198 +++++++++---------
 docs/integrations/overview.md                 |  20 +-
 docs/pods/configuration/expose-ports.md       |   4 +-
 .../troubleshooting/leaked-api-keys.md        |   2 +-
 docs/serverless/workers/handlers/overview.md  |   9 +-
 docs/serverless/workers/vllm/get-started.md   |   1 -
 docs/{pods => }/storage/_category_.json       |   0
 docs/{pods => }/storage/_volume.md            |   0
 .../storage/create-network-volumes.md         |   0
 docs/storage/model-caching.md                 | 169 +++++++++++++++
 docs/{pods => }/storage/sync-volumes.md       |   2 +-
 docs/{pods => }/storage/transfer-files.md     |   6 +-
 docs/{pods => }/storage/types.md              |   0
 sidebars.js                                   |  10 +
 16 files changed, 312 insertions(+), 132 deletions(-)
 rename docs/{pods => }/storage/_category_.json (100%)
 rename docs/{pods => }/storage/_volume.md (100%)
 rename docs/{pods => }/storage/create-network-volumes.md (100%)
 create mode 100644 docs/storage/model-caching.md
 rename docs/{pods => }/storage/sync-volumes.md (88%)
 rename docs/{pods => }/storage/transfer-files.md (96%)
 rename docs/{pods => }/storage/types.md (100%)

diff --git a/docs/get-started/api-keys.md b/docs/get-started/api-keys.md
index 838c4af..54d26ca 100644
--- a/docs/get-started/api-keys.md
+++ b/docs/get-started/api-keys.md
@@ -9,7 +9,7 @@ You can generate an API key with **Read/Write** permission, **Restricted** permi
 
 :::note
 
-Legacy API keys generated before November 11, 2024 have either Read/Write or Read Only access to GraphQL based on what was set for that key. All legacy keys have full access to AI API. To improve security, generate a new key with Restricted permission and select the minimum permission needed for your use case. 
+Legacy API keys generated before November 11, 2024 have either Read/Write or Read Only access to GraphQL based on what was set for that key. All legacy keys have full access to AI API. To improve security, generate a new key with Restricted permission and select the minimum permission needed for your use case.
 
 :::
 
@@ -20,16 +20,18 @@ To create an API key:
 1. From the console, select **Settings**.
 2. Under **API Keys**, choose **+ Create API Key**.
 3. Select the permission. If you choose **Restricted** permission, you can customize access for each API:
-    - **None**: No access
-    - (AI API only) **Restricted**: Custom access to specific endpoints. No access is default.
-    - **Read/Write**: Full access 
-    - **Read Only**: Read access without write access 
-    :::warning
+   - **None**: No access
+   - (AI API only) **Restricted**: Custom access to specific endpoints. No access is default.
+   - **Read/Write**: Full access
+   - **Read Only**: Read access without write access
 
-    Select the minimum permission needed for your use case. Only allow full access to GraphQL when absolutely necessary for automations like creating or managing RunPod resources outside of Serverless endpoints. 
+:::warning
 
-    :::
-5. Choose **Create**.
+Select the minimum permission needed for your use case. Only allow full access to GraphQL when absolutely necessary for automations like creating or managing RunPod resources outside of Serverless endpoints.
+
+:::
+
+4. Choose **Create**.
 
 :::note
 
diff --git a/docs/hosting/maintenance-and-reliability.md b/docs/hosting/maintenance-and-reliability.md
index a046935..6eed05c 100644
--- a/docs/hosting/maintenance-and-reliability.md
+++ b/docs/hosting/maintenance-and-reliability.md
@@ -5,9 +5,10 @@ description: "Schedule maintenance with at least one-week notice to minimize dis
 
 ## Maintenance
 
-Hosts must currently schedule maintenance at least one week in advance and are able to program immediate maintenance *only* in the case that their server is unrented.
+Hosts must currently schedule maintenance at least one week in advance and are able to program immediate maintenance _only_ in the case that their server is unrented.
 Users will get email reminders of upcoming maintenance that will occur on their active pods.
 Please contact RunPod on Discord or Slack if you are:
+
 - scheduling maintenance on more than a few machines, and/or
 - performing operations that could affect user data
 
diff --git a/docs/hosting/partner-requirements.md b/docs/hosting/partner-requirements.md
index 119084f..c9ec7d2 100644
--- a/docs/hosting/partner-requirements.md
+++ b/docs/hosting/partner-requirements.md
@@ -2,15 +2,15 @@
 
 # Introduction
 
-This document outlines the specifications required to be a RunPod secure cloud partner. These requirements establish the baseline, however for new partners, RunPod will perform a due diligence process prior to selection encompassing business health, prior performance, and corporate alignment. 
+This document outlines the specifications required to be a RunPod secure cloud partner. These requirements establish the baseline, however for new partners, RunPod will perform a due diligence process prior to selection encompassing business health, prior performance, and corporate alignment.
 
-Meeting these technical and operational requirements does not guarantee selection. 
+Meeting these technical and operational requirements does not guarantee selection.
 
-*New partners*
+_New partners_
 
 - All specifications will apply to new partners on November 1, 2024.
 
-*Existing partners*
+_Existing partners_
 
 - Hardware specifications (Sections 1, 2, 3, 4) will apply to new servers deployed by existing partners on December 15, 2024.
 - Compliance specification (Section 5) will apply to existing partners on April 1, 2025.
@@ -19,7 +19,7 @@ A new revision will be released in October 2025 on an annual basis. Minor mid-ye
 
 ## Minimum deployment size
 
-100kW of GPU server capacity is the minimum deployment size. 
+100kW of GPU server capacity is the minimum deployment size.
 
 ## 1. Hardware Requirements
 
@@ -27,25 +27,25 @@ A new revision will be released in October 2025 on an annual basis. Minor mid-ye
 
 #### GPU Requirements
 
-NVIDIA GPUs no older than Ampere generation. 
+NVIDIA GPUs no older than Ampere generation.
 
 ### CPU
 
-| Requirement | Specification |
-| --- | --- |
-| Cores | Minimum 4 physical CPU cores per GPU + 2 for system operations |
-| Clock Speed | Minimum 3.5 GHz base clock, with boost clock of at least 4.0 GHz |
+| Requirement      | Specification                                                                                                                          |
+| ---------------- | -------------------------------------------------------------------------------------------------------------------------------------- |
+| Cores            | Minimum 4 physical CPU cores per GPU + 2 for system operations                                                                         |
+| Clock Speed      | Minimum 3.5 GHz base clock, with boost clock of at least 4.0 GHz                                                                       |
 | Recommended CPUs | AMD EPYC 9654 (96 cores, up to 3.7 GHz), Intel Xeon Platinum 8490H (60 cores, up to 4.8 GHz), AMD EPYC 9474F (48 cores, up to 4.1 GHz) |
 
 ### Bus Bandwidth
 
-| GPU VRAM | Minimum Bandwidth |
-| --- | --- |
-| 8/10/12/16 GB | PCIe 3.0 x16 |
-| 20/24/32/40/48 GB | PCIe 4.0 x16 |
-| 80 GB | PCIe 5.0 x16  |
+| GPU VRAM          | Minimum Bandwidth |
+| ----------------- | ----------------- |
+| 8/10/12/16 GB     | PCIe 3.0 x16      |
+| 20/24/32/40/48 GB | PCIe 4.0 x16      |
+| 80 GB             | PCIe 5.0 x16      |
 
-Exceptions list: 
+Exceptions list:
 
 1. PCIe 4.0 x16 - A100 80GB PCI-E
 
@@ -53,12 +53,12 @@ Exceptions list:
 
 Main system memory must have ECC.
 
-| GPU Configuration | Recommended RAM |
-| --- | --- |
-| 8x 80 GB VRAM | >= 2048 GB DDR5 |
-| 8x 40/48 GB VRAM | >= 1024 GB DDR5 |
-| 8x 24 GB VRAM | >= 512 GB DDR4/5 |
-| 8x 16 GB VRAM | >= 256 GB DDR4/5 |
+| GPU Configuration | Recommended RAM  |
+| ----------------- | ---------------- |
+| 8x 80 GB VRAM     | >= 2048 GB DDR5  |
+| 8x 40/48 GB VRAM  | >= 1024 GB DDR5  |
+| 8x 24 GB VRAM     | >= 512 GB DDR4/5 |
+| 8x 16 GB VRAM     | >= 256 GB DDR4/5 |
 
 ### Storage
 
@@ -66,66 +66,66 @@ There are two types of required storage, boot and working arrays. These are two
 
 ### Boot array
 
-| **Requirement** | **Specification** |
-| --- | --- |
-| Redundancy | >= 2n redundancy (RAID 1) |
-| Size | >= 500GB (Post RAID) |
-| Disk Perf - Sequential read | 2,000 MB/s |
-| Disk Perf - Sequential write | 2,000 MB/s |
-| Disk Perf - Random Read (4K QD32) | 100,000 IOPS |
-| Disk Perf - Random Write (4K QD32) | 10,000 IOPS |
+| **Requirement**                    | **Specification**         |
+| ---------------------------------- | ------------------------- |
+| Redundancy                         | >= 2n redundancy (RAID 1) |
+| Size                               | >= 500GB (Post RAID)      |
+| Disk Perf - Sequential read        | 2,000 MB/s                |
+| Disk Perf - Sequential write       | 2,000 MB/s                |
+| Disk Perf - Random Read (4K QD32)  | 100,000 IOPS              |
+| Disk Perf - Random Write (4K QD32) | 10,000 IOPS               |
 
 ### Working array
 
-| Component | Requirement |
-| --- | --- |
-| Redundancy | >= 2n redundancy (RAID 1 or RAID 10) |
-| Size | 2 TB+ NVME per GPU for 24/48 GB GPUs; 4 TB+ NVME per GPU for 80 GB GPUs (Post RAID) |
-| Disk Perf - Sequential read | 6,000 MB/s |
-| Disk Perf - Sequential write | 5,000 MB/s |
-| Disk Perf - Random Read (4K QD32) | 400,000 IOPS |
-| Disk Perf - Random Write (4K QD32) | 40,000 IOPS |
+| Component                          | Requirement                                                                         |
+| ---------------------------------- | ----------------------------------------------------------------------------------- |
+| Redundancy                         | >= 2n redundancy (RAID 1 or RAID 10)                                                |
+| Size                               | 2 TB+ NVME per GPU for 24/48 GB GPUs; 4 TB+ NVME per GPU for 80 GB GPUs (Post RAID) |
+| Disk Perf - Sequential read        | 6,000 MB/s                                                                          |
+| Disk Perf - Sequential write       | 5,000 MB/s                                                                          |
+| Disk Perf - Random Read (4K QD32)  | 400,000 IOPS                                                                        |
+| Disk Perf - Random Write (4K QD32) | 40,000 IOPS                                                                         |
 
 ### 1.2 Storage Cluster Requirements
 
-Each datacenter must have a storage cluster which provides shared storage between all GPU servers. The hardware is provided by the partner, storage cluster licensing is provided by RunPod. All storage servers must be accessible by all GPU compute machines. 
+Each datacenter must have a storage cluster which provides shared storage between all GPU servers. The hardware is provided by the partner, storage cluster licensing is provided by RunPod. All storage servers must be accessible by all GPU compute machines.
 
 ### Baseline Cluster Specifications
 
-| Component | Requirement |
-| --- | --- |
-| Minimum Servers | 4 |
-| Minimum Storage size | 200 TB raw (100 TB usable) |
-| Connectivity | 200 Gbps between servers/data-plane |
-| Network | Private subnet |
+| Component            | Requirement                         |
+| -------------------- | ----------------------------------- |
+| Minimum Servers      | 4                                   |
+| Minimum Storage size | 200 TB raw (100 TB usable)          |
+| Connectivity         | 200 Gbps between servers/data-plane |
+| Network              | Private subnet                      |
 
 ### Server Specifications
 
-| Component | Requirement |
-| --- | --- |
-| CPU | AMD Genoa: EPYC 9354P (32-Core, 3.25-3.8 GHz), EPYC 9534 (64-Core, 2.45-3.7 GHz), or EPYC 9554 (64-Core, 3.1-3.75 GHz) |
-| RAM | 256 GB or higher, DDR5/ECC |
+| Component | Requirement                                                                                                            |
+| --------- | ---------------------------------------------------------------------------------------------------------------------- |
+| CPU       | AMD Genoa: EPYC 9354P (32-Core, 3.25-3.8 GHz), EPYC 9534 (64-Core, 2.45-3.7 GHz), or EPYC 9554 (64-Core, 3.1-3.75 GHz) |
+| RAM       | 256 GB or higher, DDR5/ECC                                                                                             |
 
 ### Storage Cluster Server Boot Array
 
-| Requirement | Specification |
-| --- | --- |
-| Redundancy | >= 2n redundancy (RAID 1) |
-| Size | >= 500GB (Post RAID) |
-| Disk Perf - Sequential read | 2,000 MB/s |
-| Disk Perf - Sequential write | 2,000 MB/s |
-| Disk Perf - Random Read (4K QD32) | 100,000 IOPS |
-| Disk Perf - Random Write (4K QD32) | 10,000 IOPS |
+| Requirement                        | Specification             |
+| ---------------------------------- | ------------------------- |
+| Redundancy                         | >= 2n redundancy (RAID 1) |
+| Size                               | >= 500GB (Post RAID)      |
+| Disk Perf - Sequential read        | 2,000 MB/s                |
+| Disk Perf - Sequential write       | 2,000 MB/s                |
+| Disk Perf - Random Read (4K QD32)  | 100,000 IOPS              |
+| Disk Perf - Random Write (4K QD32) | 10,000 IOPS               |
 
 ### Storage Cluster Server Working Array
 
-| Component | Requirement |
-| --- | --- |
-| Redundancy | None (JBOD) - RunPod will assemble into array. 7 to 14TB disk sizes recommended. |
-| Disk Perf - Sequential read | 6,000 MB/s |
-| Disk Perf - Sequential write | 5,000 MB/s |
-| Disk Perf - Random Read (4K QD32) | 400,000 IOPS |
-| Disk Perf - Random Write (4K QD32) | 40,000 IOPS |
+| Component                          | Requirement                                                                      |
+| ---------------------------------- | -------------------------------------------------------------------------------- |
+| Redundancy                         | None (JBOD) - RunPod will assemble into array. 7 to 14TB disk sizes recommended. |
+| Disk Perf - Sequential read        | 6,000 MB/s                                                                       |
+| Disk Perf - Sequential write       | 5,000 MB/s                                                                       |
+| Disk Perf - Random Read (4K QD32)  | 400,000 IOPS                                                                     |
+| Disk Perf - Random Write (4K QD32) | 40,000 IOPS                                                                      |
 
 Servers should have spare disk slots for future expansion without deployment of new servers.
 
@@ -135,11 +135,11 @@ Even distribution among machines (e.g., 7 TB x 8 disks x 4 servers = 224 TB tota
 
 Once a storage cluster exceeds 90% single core CPU on the leader node during peak hours, a dedicated metadata server is required. Metadata tracking is a single process operation, and single threaded performance is the most important metric.
 
-| Component | Requirement |
-| --- | --- |
-| CPU | AMD Ryzen Threadripper 7960X (24-Cores, 4.2-5.3 GHz) |
-| RAM | 128 GB or higher, DDR5/ECC |
-| Boot disk | >= 500 GB, RAID 1 |
+| Component | Requirement                                          |
+| --------- | ---------------------------------------------------- |
+| CPU       | AMD Ryzen Threadripper 7960X (24-Cores, 4.2-5.3 GHz) |
+| RAM       | 128 GB or higher, DDR5/ECC                           |
+| Boot disk | >= 500 GB, RAID 1                                    |
 
 ## 2. Software Requirements
 
@@ -156,11 +156,11 @@ Update server BIOS/firmware to latest stable version
 
 ### Drivers and Software
 
-| Component | Requirement |
-| --- | --- |
-| NVIDIA Drivers | Version 550.54.15 or later production version |
-| CUDA | Version 12.4 or later production version |
-| NVIDIA Persistence | Activated for GPUs of 48 GB or more |
+| Component          | Requirement                                   |
+| ------------------ | --------------------------------------------- |
+| NVIDIA Drivers     | Version 550.54.15 or later production version |
+| CUDA               | Version 12.4 or later production version      |
+| NVIDIA Persistence | Activated for GPUs of 48 GB or more           |
 
 ### HGX SXM System Addendum
 
@@ -172,29 +172,29 @@ Update server BIOS/firmware to latest stable version
 
 ## 3. Data Center Power Requirements
 
-| Requirement | Specification |
-| --- | --- |
-| Utility Feeds | - Minimum of two independent utility feeds from separate substations <br /> - Each feed capable of supporting 100% of the data center's power load<br />- Automatic transfer switches (ATS) for seamless switchover between feeds with UL 1008 certification (or regional equivalent) |
-| UPS | - N+1 redundancy for UPS systems<br />- Minimum of 15 minutes runtime at full load |
-| Generators | - N+1 redundancy for generator systems<br />- Generators must be able to support 100% of the data center's power load<br />- Minimum of 48 hours of on-site fuel storage at full load<br />- Automatic transfer to generator power within 10 seconds of utility failure |
-| Power Distribution | - Redundant power distribution paths (2N) from utility to rack level<br />- Redundant Power Distribution Units (PDUs) in each rack<br />- Remote power monitoring and management capabilities at rack level |
+| Requirement             | Specification                                                                                                                                                                                                                                                                                                                                                                                                                             |
+| ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Utility Feeds           | - Minimum of two independent utility feeds from separate substations <br /> - Each feed capable of supporting 100% of the data center's power load<br />- Automatic transfer switches (ATS) for seamless switchover between feeds with UL 1008 certification (or regional equivalent)                                                                                                                                                     |
+| UPS                     | - N+1 redundancy for UPS systems<br />- Minimum of 15 minutes runtime at full load                                                                                                                                                                                                                                                                                                                                                        |
+| Generators              | - N+1 redundancy for generator systems<br />- Generators must be able to support 100% of the data center's power load<br />- Minimum of 48 hours of on-site fuel storage at full load<br />- Automatic transfer to generator power within 10 seconds of utility failure                                                                                                                                                                   |
+| Power Distribution      | - Redundant power distribution paths (2N) from utility to rack level<br />- Redundant Power Distribution Units (PDUs) in each rack<br />- Remote power monitoring and management capabilities at rack level                                                                                                                                                                                                                               |
 | Testing and Maintenance | - Monthly generator tests under load for a minimum of 30 minutes<br />- Quarterly full-load tests of the entire backup power system, including UPS and generators<br />- Annual full-facility power outage test (coordinated with RunPod)<br />- Regular thermographic scanning of electrical systems<br />- Detailed maintenance logs for all power equipment<br />- 24/7 on-site facilities team for immediate response to power issues |
-| Monitoring and Alerting | - Real-time monitoring of all power systems<br />- Automated alerting for any power anomalies or threshold breaches |
-| Capacity Planning | - Maintain a minimum of 20% spare power capacity for future growth<br />- Annual power capacity audits and forecasting |
-| Fire Suppression | - Maintain datacenter fire suppression systems in compliance with NFPA 75 and 76 (or regional equivalent) |
+| Monitoring and Alerting | - Real-time monitoring of all power systems<br />- Automated alerting for any power anomalies or threshold breaches                                                                                                                                                                                                                                                                                                                       |
+| Capacity Planning       | - Maintain a minimum of 20% spare power capacity for future growth<br />- Annual power capacity audits and forecasting                                                                                                                                                                                                                                                                                                                    |
+| Fire Suppression        | - Maintain datacenter fire suppression systems in compliance with NFPA 75 and 76 (or regional equivalent)                                                                                                                                                                                                                                                                                                                                 |
 
 ## 4. Network Requirements
 
-| Requirement | Specification |
-| --- | --- |
-| Internet Connectivity | - Minimum of two diverse and redundant internet circuits from separate providers<br />- Each connection should be capable of supporting 100% of the data center's bandwidth requirements<br />- BGP routing implemented for automatic failover between circuit providers<br />- 100 Gbps minimum total bandwidth capacity |
-| Core Infrastructure | - Redundant core switches in a high-availability configuration (e.g., stacking, VSS, or equivalent) |
-| Distribution Layer | - Redundant distribution switches with multi-chassis link aggregation (MLAG) or equivalent technology<br />- Minimum 100 Gbps uplinks to core switches |
-| Access Layer | - Redundant top-of-rack switches in each cabinet<br />- Minimum 100 Gbps server connections for high-performance compute nodes |
-| DDoS Protection | - Must have a DDoS mitigation solution, either on-premises or on-demand cloud-based |
-| Quality of service | Maintain network performance within the following parameters:<br />  * Network utilization levels must remain below 80% on any link during peak hours<br />  * Packet loss must not exceed 0.1% (1 in 1000) on any network segment<br />  * P95 round-trip time (RTT) within the data center should not exceed 4ms<br />  * P95 jitter within the datacenter should not exceed 3ms |
-| Testing and Maintenance | - Regular failover testing of all redundant components (minimum semi-annually)<br />- Annual full-scale disaster recovery test<br />- Maintenance windows for network updates and patches, with minimal service disruption scheduled at least 1 week in advance |
-| Capacity Planning | - Maintain a minimum of 40% spare network capacity for future growth<br />- Regular network performance audits and capacity forecasting |
+| Requirement             | Specification                                                                                                                                                                                                                                                                                                                                                                  |
+| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| Internet Connectivity   | - Minimum of two diverse and redundant internet circuits from separate providers<br />- Each connection should be capable of supporting 100% of the data center's bandwidth requirements<br />- BGP routing implemented for automatic failover between circuit providers<br />- 100 Gbps minimum total bandwidth capacity                                                      |
+| Core Infrastructure     | - Redundant core switches in a high-availability configuration (e.g., stacking, VSS, or equivalent)                                                                                                                                                                                                                                                                            |
+| Distribution Layer      | - Redundant distribution switches with multi-chassis link aggregation (MLAG) or equivalent technology<br />- Minimum 100 Gbps uplinks to core switches                                                                                                                                                                                                                         |
+| Access Layer            | - Redundant top-of-rack switches in each cabinet<br />- Minimum 100 Gbps server connections for high-performance compute nodes                                                                                                                                                                                                                                                 |
+| DDoS Protection         | - Must have a DDoS mitigation solution, either on-premises or on-demand cloud-based                                                                                                                                                                                                                                                                                            |
+| Quality of service      | Maintain network performance within the following parameters:<br /> * Network utilization levels must remain below 80% on any link during peak hours<br /> * Packet loss must not exceed 0.1% (1 in 1000) on any network segment<br /> * P95 round-trip time (RTT) within the data center should not exceed 4ms<br /> * P95 jitter within the datacenter should not exceed 3ms |
+| Testing and Maintenance | - Regular failover testing of all redundant components (minimum semi-annually)<br />- Annual full-scale disaster recovery test<br />- Maintenance windows for network updates and patches, with minimal service disruption scheduled at least 1 week in advance                                                                                                                |
+| Capacity Planning       | - Maintain a minimum of 40% spare network capacity for future growth<br />- Regular network performance audits and capacity forecasting                                                                                                                                                                                                                                        |
 
 ## 5. Compliance Requirements
 
@@ -206,12 +206,12 @@ To qualify as a RunPod secure cloud partner, the parent organization must adhere
 
 Additionally, partners must comply with the following operational standards:
 
-| Requirement | Description |
-| --- | --- |
-| Data Center Tier | Abide by Tier III+ Data Center Standards |
-| Security | 24/7 on-site security and technical staff |
+| Requirement       | Description                                                                                                                                                                                            |
+| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| Data Center Tier  | Abide by Tier III+ Data Center Standards                                                                                                                                                               |
+| Security          | 24/7 on-site security and technical staff                                                                                                                                                              |
 | Physical security | RunPod servers must be held in an isolated secure rack or cage in an area that is not accessible to any non-partner or approved DC personnel. Physical access to this area must be tracked and logged. |
-| Maintenance | All maintenance resulting in disruption or downtime must be scheduled at least 1 week in advance. Large disruptions must be coordinated with RunPod at least 1 month in advance. |
+| Maintenance       | All maintenance resulting in disruption or downtime must be scheduled at least 1 week in advance. Large disruptions must be coordinated with RunPod at least 1 month in advance.                       |
 
 RunPod will review evidence of:
 
@@ -228,4 +228,4 @@ For detailed information on maintenance scheduling, power system management, and
 
 ### Release log
 
-- 2025-11-01: Initial release. 
\ No newline at end of file
+- 2025-11-01: Initial release.
diff --git a/docs/integrations/overview.md b/docs/integrations/overview.md
index 52f7f42..76eda44 100644
--- a/docs/integrations/overview.md
+++ b/docs/integrations/overview.md
@@ -1,11 +1,11 @@
+---
+title: Integrations
 ---
-title: Integrations
----
-
-import DocCardList from '@theme/DocCardList';
-
-# Integrations
-
-RunPod integrates with various tools that enable you to automate interactions, such as managing containers with infrastructure-as-code and interacting with serverless endpoints without using the WebUI or API. These integrations provide flexibility and automation for streamlining complex workflows and scaling your operations.
-
-<DocCardList />
+
+import DocCardList from '@theme/DocCardList';
+
+# Integrations
+
+RunPod integrates with various tools that enable you to automate interactions, such as managing containers with infrastructure-as-code and interacting with serverless endpoints without using the WebUI or API. These integrations provide flexibility and automation for streamlining complex workflows and scaling your operations.
+
+<DocCardList />
diff --git a/docs/pods/configuration/expose-ports.md b/docs/pods/configuration/expose-ports.md
index c9f5e0b..70a7a3d 100644
--- a/docs/pods/configuration/expose-ports.md
+++ b/docs/pods/configuration/expose-ports.md
@@ -42,7 +42,7 @@ It's crucial to be aware of the following behavior:
 - If your service does not respond within 100 seconds of a request, the connection will be closed.
 - In such cases, the user will receive a `524` error code.
 
-This timeout limit is particularly important for long-running operations or services that might take more than 100 seconds to respond. 
+This timeout limit is particularly important for long-running operations or services that might take more than 100 seconds to respond.
 Make sure to design your applications with this limitation in mind, potentially implementing progress updates or chunked responses for longer operations.
 
 ### Through TCP Public IP
@@ -79,4 +79,4 @@ In this case, I have requested two symmetrical ports and they ended up being 100
 ```text
 RUNPOD_TCP_PORT_70001=10031
 RUNPOD_TCP_PORT_70000=10030
-```
\ No newline at end of file
+```
diff --git a/docs/references/troubleshooting/leaked-api-keys.md b/docs/references/troubleshooting/leaked-api-keys.md
index d332c31..1d428cf 100644
--- a/docs/references/troubleshooting/leaked-api-keys.md
+++ b/docs/references/troubleshooting/leaked-api-keys.md
@@ -18,4 +18,4 @@ To disable an API key:
 To delete an API key:
 
 1. From the console, select **Settings**.
-2. Under **API Keys**, select the trash can icon and select **Revoke Key**.
\ No newline at end of file
+2. Under **API Keys**, select the trash can icon and select **Revoke Key**.
diff --git a/docs/serverless/workers/handlers/overview.md b/docs/serverless/workers/handlers/overview.md
index faa674b..420ef0c 100644
--- a/docs/serverless/workers/handlers/overview.md
+++ b/docs/serverless/workers/handlers/overview.md
@@ -71,7 +71,7 @@ def handler(job):
 runpod.serverless.start({"handler": handler})  # Required.
 ```
 
-You must return something as output when your worker is done processing the job. 
+You must return something as output when your worker is done processing the job.
 This can directly be the output, or it can be links to cloud storage where the artifacts are saved.
 Keep in mind that the input and output payloads are limited to 2 MB each.
 
@@ -79,7 +79,6 @@ Keep in mind that the input and output payloads are limited to 2 MB each.
 
 Keep setup processes and functions outside of your handler function. For example, if you are running models make sure they are loaded into VRAM prior to calling `serverless.start` with your handler function.
 
-
 <details>
   <summary>Example</summary>
 <Tabs>
@@ -125,16 +124,16 @@ def handler(event):
 runpod.serverless.start({"handler": handler})
 ```
 
-  </TabItem>
+</TabItem>
   <TabItem value="cli" label="CLI">
 
 The following is an example of the input command.
 
 ```command
- python your_handler.py --test_input '{"input": {"prompt": "The quick brown fox jumps"}}'
+python your_handler.py --test_input '{"input": {"prompt": "The quick brown fox jumps"}}'
 ```
 
-   </TabItem>
+</TabItem>
 </Tabs>
 
 </details>
diff --git a/docs/serverless/workers/vllm/get-started.md b/docs/serverless/workers/vllm/get-started.md
index b1517c1..37ea95b 100644
--- a/docs/serverless/workers/vllm/get-started.md
+++ b/docs/serverless/workers/vllm/get-started.md
@@ -142,7 +142,6 @@ chat_completion = client.chat.completions.create(
 print(chat_completion)
 ```
 
-
 </TabItem>
   <TabItem value="node.js" label="Node.js">
 
diff --git a/docs/pods/storage/_category_.json b/docs/storage/_category_.json
similarity index 100%
rename from docs/pods/storage/_category_.json
rename to docs/storage/_category_.json
diff --git a/docs/pods/storage/_volume.md b/docs/storage/_volume.md
similarity index 100%
rename from docs/pods/storage/_volume.md
rename to docs/storage/_volume.md
diff --git a/docs/pods/storage/create-network-volumes.md b/docs/storage/create-network-volumes.md
similarity index 100%
rename from docs/pods/storage/create-network-volumes.md
rename to docs/storage/create-network-volumes.md
diff --git a/docs/storage/model-caching.md b/docs/storage/model-caching.md
new file mode 100644
index 0000000..ac25fef
--- /dev/null
+++ b/docs/storage/model-caching.md
@@ -0,0 +1,169 @@
+---
+title: Model caching
+description: "Model caching allows you to quickly switch out machine learning models in your code."
+sidebar_position: 4
+---
+
+Model caching allows you to dynamically load and switch between machine learning models in your applications without rebuilding your contianer images or changing your code.
+It automatically handles model and dataset downloads and makes them available to your application.
+
+:::note
+
+Model caching currently supports models and data sets from [Hugging Face](https://huggingface.co/).
+
+:::
+
+## Benefits
+
+- Faster Development: Switch models instantly without rebuilding containers
+- Better Performance: Optimized cold start times and caching
+- Easy Integration: Works with popular ML frameworks like PyTorch and Transformers
+
+You can cache your models for both Pods and Serverless.
+
+## Get started with Serverless
+
+With model caching you can preload the models.
+This helps so you don't need to bake the model into your docker image or wait for the Worker to download your model from Hugging Face.
+
+1. Log in to the [RunPod Serverless console](https://www.runpod.io/console/serverless).
+2. Select **+ New Endpoint**.
+3. Provide the following:
+   1. Endpoint name.
+   2. Select your GPU configuration.
+   3. Configure the number of Workers.
+   4. (optional) Select **FlashBoot**.
+   5. (optional) Select a template.
+   6. Enter the name of your Docker image.
+      - For example `<username>/<repo>:<tag>`.
+   7. Specify enough memory for your Docker image.
+4. Add your Hugging Face model.
+   1. Add the name of the model you want to use (up to five per endpoint).
+   2. (optional) Add your Hugging Face API Key for gated or private models.
+5. Select **Deploy**.
+
+The Model Cache will automatically download the model and make it available to your code.
+
+## Get started with Pods
+
+With model caching you can preload the models.
+This helps so you don't need to bake the model download the models while your Pod is starting up.
+RunPod handles all of this for you.
+
+1. Navigate to [Pods](https://www.runpod.io/console/pods) and select **+ Deploy**.
+2. Choose between **GPU** and **CPU**.
+3. Customize your an instance by setting up the following:
+   1. (optional) Specify a Network volume.
+   2. Select an instance type. For example, **A40**.
+   3. (optional) Provide a template. For example, **RunPod Pytorch**.
+   4. (GPU only) Specify your compute count.
+   5. Add your Hugging Face model.
+4. Review your configuration and select **Deploy On-Demand**.
+
+The Model Cache will automatically download the model and make it available to your code on your Pod.
+
+## How to interact with your models
+
+The model path is as followed:
+
+```
+/runpod/cache/model/$MODELNAME
+```
+
+You can set this path in your code.
+For example:
+
+```python
+import transformers
+
+# path to your model
+model = AutoModel.from_pretrained("/runpod/cache/model/$MODEL_NAME/main")
+```
+
+Now when this code executes in a Pod or Serverless example, it will already have the model available to you.
+
+## Environment variables
+
+HuggingFace models and datasets are configured using environment variables.
+
+For public models and datasets, you only need to specify what to download.
+For private resources, you must provide authentication credentials.
+
+### Model and dataset selection
+
+- `RUNPOD_HUGGINGFACE_MODEL`\
+  Specifies which models to download. Accepts a comma-separated list of models in the format `user/model[:branch]`.
+
+- `RUNPOD_HUGGINGFACE_DATASET`\
+  Specifies which datasets to download. Accepts a comma-separated list of datasets in the format `user/dataset[:branch]`.
+
+### Authentication (Optional)
+
+Both variables must be provided together for private resource access:
+
+- `RUNPOD_HUGGINGFACE_TOKEN`\
+  Your HuggingFace authentication token.
+
+- `RUNPOD_HUGGINGFACE_USER`\
+  Your HuggingFace username.
+
+### Basic usage
+
+Download a single model or dataset from the default (`main`) branch:
+
+```bash
+# Download a model
+RUNPOD_HUGGINGFACE_MODEL="openai/whisper-large"
+
+# Download a dataset
+RUNPOD_HUGGINGFACE_DATASET="mozilla-foundation/common_voice_11_0"
+```
+
+### Specifying branches
+
+Access specific branches by appending `:branch-name`:
+
+```bash
+# Download from a specific branch
+RUNPOD_HUGGINGFACE_MODEL="openai/whisper-large:experimental"
+```
+
+### Multiple resources
+
+Download multiple models or datasets by separating them with commas:
+
+```bash
+# Download multiple models
+RUNPOD_HUGGINGFACE_MODEL="openai/whisper-large,google/flan-t5-base"
+
+# Download multiple datasets with different branches
+RUNPOD_HUGGINGFACE_DATASET="mozilla-foundation/common_voice_11_0,huggingface/dataset-metrics:dev"
+```
+
+### Private resources
+
+Access private resources by providing authentication:
+
+```bash
+RUNPOD_HUGGINGFACE_USER="your-username"
+RUNPOD_HUGGINGFACE_TOKEN="hf_..."
+RUNPOD_HUGGINGFACE_MODEL="your-org/private-model"
+```
+
+## Example configurations
+
+```bash
+# Single public model
+RUNPOD_HUGGINGFACE_MODEL="facebook/opt-350m"
+
+# Multiple models with different branches
+RUNPOD_HUGGINGFACE_MODEL="facebook/opt-350m:main, google/flan-t5-base:experimental"
+
+# Private model with authentication
+RUNPOD_HUGGINGFACE_USER="your-username"
+RUNPOD_HUGGINGFACE_TOKEN="hf_..."
+RUNPOD_HUGGINGFACE_MODEL="your-org/private-model"
+
+# Multiple datasets
+RUNPOD_HUGGINGFACE_DATASET="mozilla-foundation/common_voice_11_0, huggingface/dataset-metrics"
+```
diff --git a/docs/pods/storage/sync-volumes.md b/docs/storage/sync-volumes.md
similarity index 88%
rename from docs/pods/storage/sync-volumes.md
rename to docs/storage/sync-volumes.md
index ea55c1d..b6c343f 100644
--- a/docs/pods/storage/sync-volumes.md
+++ b/docs/storage/sync-volumes.md
@@ -4,4 +4,4 @@ sidebar_position: 9
 description: "Sync your volume to a cloud provider by clicking 'Cloud Sync' on your My Pods page, then follow provider-specific instructions from the dropdown menu."
 ---
 
-You can sync your volume to a cloud provider by clicking the **Cloud Sync** option under your **My Pods** page. For detailed instructions on connecting to AWS S3, Google Cloud Storage, Azure, Backblaze, Dropbox, and configuring these services, please refer to this [configuration guide](../configuration/export-data.md).
+You can sync your volume to a cloud provider by clicking the **Cloud Sync** option under your **My Pods** page. For detailed instructions on connecting to AWS S3, Google Cloud Storage, Azure, Backblaze, Dropbox, and configuring these services, please refer to this [configuration guide](/pods/configuration/export-data).
diff --git a/docs/pods/storage/transfer-files.md b/docs/storage/transfer-files.md
similarity index 96%
rename from docs/pods/storage/transfer-files.md
rename to docs/storage/transfer-files.md
index 002897b..8f55c6b 100644
--- a/docs/pods/storage/transfer-files.md
+++ b/docs/storage/transfer-files.md
@@ -8,7 +8,7 @@ Learn to transfer files to and from RunPod.
 
 ## Prerequisites
 
-- If you intend to use `runpodctl`, make sure it's installed on your machine, see [install runpodctl](../../runpodctl/install-runpodctl.md)
+- If you intend to use `runpodctl`, make sure it's installed on your machine, see [install runpodctl](/runpodctl/install-runpodctl)
 
 - If you intend to use `scp`, make sure your Pod is configured to use real SSH.
   For more information, see [use SSH](/pods/configuration/use-ssh).
@@ -17,7 +17,7 @@ Learn to transfer files to and from RunPod.
 
 - Note the public IP address and external port from the SSH over exposed TCP command (you'll need these for the SCP/rsync commands).
 
-## Transferring with [runpodctl](../../runpodctl/overview.md#data-transfer)
+## Transferring with [runpodctl](/runpodctl/overview#data-transfer)
 
 The RunPod CLI (runpodctl) provides simple commands for transferring data between your machine and RunPod. **It’s preinstalled on all RunPod Pods** and uses one-time codes for secure authentication, so no API keys are required.
 
@@ -154,4 +154,4 @@ total size is 119  speedup is 0.90
 
 ## Sync a volume to a cloud provider
 
-You can sync your volume to a cloud provider by clicking the **Cloud Sync** option under your **My Pods** page, For detailed instructions on connecting to AWS S3, Google Cloud Storage, Azure, Backblaze, Dropbox, and configuring these services, please refer to this [configuration guide](../configuration/export-data.md).
+You can sync your volume to a cloud provider by clicking the **Cloud Sync** option under your **My Pods** page, For detailed instructions on connecting to AWS S3, Google Cloud Storage, Azure, Backblaze, Dropbox, and configuring these services, please refer to this [configuration guide](/pods/configuration/export-data).
diff --git a/docs/pods/storage/types.md b/docs/storage/types.md
similarity index 100%
rename from docs/pods/storage/types.md
rename to docs/storage/types.md
diff --git a/sidebars.js b/sidebars.js
index 7e4f1db..611cc77 100644
--- a/sidebars.js
+++ b/sidebars.js
@@ -35,6 +35,16 @@ module.exports = {
         },
       ],
     },
+    {
+      type: "category",
+      label: "Storage",
+      items: [
+        {
+          type: "autogenerated",
+          dirName: "storage",
+        },
+      ],
+    },
     {
       type: "category",
       label: "runpodctl",