Skip to content

Commit

Permalink
Merge branch 'skypilot-org:master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
hyoxt121 authored Feb 14, 2025
2 parents 660ead7 + 7170a91 commit 9fdabac
Show file tree
Hide file tree
Showing 93 changed files with 1,481 additions and 1,091 deletions.
12 changes: 10 additions & 2 deletions .buildkite/generate_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,12 @@
QUEUE_GENERIC_CLOUD = 'generic_cloud'
QUEUE_GENERIC_CLOUD_SERVE = 'generic_cloud_serve'
QUEUE_KUBERNETES = 'kubernetes'
QUEUE_EKS = 'eks'
QUEUE_GKE = 'gke'
# We use KUBE_BACKEND to specify the queue for kubernetes tests mark as
# resource_heavy. It can be either EKS or GKE.
QUEUE_KUBE_BACKEND = os.getenv('KUBE_BACKEND', QUEUE_EKS).lower()
assert QUEUE_KUBE_BACKEND in [QUEUE_EKS, QUEUE_GKE]
# Only aws, gcp, azure, and kubernetes are supported for now.
# Other clouds do not have credentials.
CLOUD_QUEUE_MAP = {
Expand Down Expand Up @@ -174,7 +179,9 @@ def _extract_marked_tests(
for function_name, marks in function_name_marks_map.items():
clouds_to_include = []
is_serve_test = 'serve' in marks
run_on_gke = 'requires_gke' in marks
run_on_cloud_kube_backend = ('resource_heavy' in marks and
'kubernetes' in default_clouds_to_run)

for mark in marks:
if mark not in PYTEST_TO_CLOUD_KEYWORD:
# This mark does not specify a cloud, so we skip it.
Expand Down Expand Up @@ -210,7 +217,8 @@ def _extract_marked_tests(
param_list += [None
] * (len(final_clouds_to_include) - len(param_list))
function_cloud_map[function_name] = (final_clouds_to_include, [
QUEUE_GKE if run_on_gke else cloud_queue_map[cloud]
QUEUE_KUBE_BACKEND
if run_on_cloud_kube_backend else cloud_queue_map[cloud]
for cloud in final_clouds_to_include
], param_list)

Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ all contributions to the project, including but not limited to:
* Documentation
* Tutorials, blog posts and talks on SkyPilot

## Contributing Code
## Contributing code

We use GitHub to track issues and features. For new contributors, we recommend looking at issues labeled ["good first issue"](https://github.com/sky-proj/sky/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22+).

Expand Down
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,9 @@

----
:fire: *News* :fire:
- [Jan 2025] Launch and Serve **[DeepSeek-R1](https://github.com/deepseek-ai/DeepSeek-R1)** and **[Janus](https://github.com/deepseek-ai/DeepSeek-Janus)** on Kubernetes or Any Cloud: [**R1 example**](./llm/deepseek-r1/) and [**Janus example**](./llm/deepseek-janus/)
- [Feb 2025] Run and Serve DeepSeek-R1 671B using SkyPilot and SGLang with high throughput: [**example**](./llm/deepseek-r1/)
- [Jan 2025] Prepare and Serve Large-Scale Image Search with **Vector Database**: [**blog post**](https://blog.skypilot.co/large-scale-vector-database/) [**example**](./examples/vector_database/)
- [Jan 2025] Launch and Serve distilled models from **[DeepSeek-R1](https://github.com/deepseek-ai/DeepSeek-R1)** and **[Janus](https://github.com/deepseek-ai/DeepSeek-Janus)** on Kubernetes or Any Cloud: [**R1 example**](./llm/deepseek-r1-distilled/) and [**Janus example**](./llm/deepseek-janus/)
- [Oct 2024] :tada: **SkyPilot crossed 1M+ downloads** :tada:: Thank you to our community! [**Twitter/X**](https://x.com/skypilot_org/status/1844770841718067638)
- [Sep 2024] Point, Launch and Serve **Llama 3.2** on Kubernetes or Any Cloud: [**example**](./llm/llama-3_2/)
- [Sep 2024] Run and deploy [**Pixtral**](./llm/pixtral), the first open-source multimodal model from Mistral AI.
Expand Down Expand Up @@ -187,7 +189,7 @@ Runnable examples:
- [LocalGPT](./llm/localgpt)
- [Falcon](./llm/falcon)
- Add yours here & see more in [`llm/`](./llm)!
- Framework examples: [PyTorch DDP](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_torch.yaml), [DeepSpeed](./examples/deepspeed-multinode/sky.yaml), [JAX/Flax on TPU](https://github.com/skypilot-org/skypilot/blob/master/examples/tpu/tpuvm_mnist.yaml), [Stable Diffusion](https://github.com/skypilot-org/skypilot/tree/master/examples/stable_diffusion), [Detectron2](https://github.com/skypilot-org/skypilot/blob/master/examples/detectron2_docker.yaml), [Distributed](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_tf_app.py) [TensorFlow](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_app_storage.yaml), [Ray Train](examples/distributed_ray_train/ray_train.yaml), [NeMo](https://github.com/skypilot-org/skypilot/blob/master/examples/nemo/), [programmatic grid search](https://github.com/skypilot-org/skypilot/blob/master/examples/huggingface_glue_imdb_grid_search_app.py), [Docker](https://github.com/skypilot-org/skypilot/blob/master/examples/docker/echo_app.yaml), [Cog](https://github.com/skypilot-org/skypilot/blob/master/examples/cog/), [Unsloth](https://github.com/skypilot-org/skypilot/blob/master/examples/unsloth/unsloth.yaml), [Ollama](https://github.com/skypilot-org/skypilot/blob/master/llm/ollama), [llm.c](https://github.com/skypilot-org/skypilot/tree/master/llm/gpt-2), [Airflow](./examples/airflow/training_workflow) and [many more (`examples/`)](./examples).
- Framework examples: [Vector Database](./examples/vector_database/), [PyTorch DDP](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_torch.yaml), [DeepSpeed](./examples/deepspeed-multinode/sky.yaml), [JAX/Flax on TPU](https://github.com/skypilot-org/skypilot/blob/master/examples/tpu/tpuvm_mnist.yaml), [Stable Diffusion](https://github.com/skypilot-org/skypilot/tree/master/examples/stable_diffusion), [Detectron2](https://github.com/skypilot-org/skypilot/blob/master/examples/detectron2_docker.yaml), [Distributed](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_tf_app.py) [TensorFlow](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_app_storage.yaml), [Ray Train](examples/distributed_ray_train/ray_train.yaml), [NeMo](https://github.com/skypilot-org/skypilot/blob/master/examples/nemo/), [programmatic grid search](https://github.com/skypilot-org/skypilot/blob/master/examples/huggingface_glue_imdb_grid_search_app.py), [Docker](https://github.com/skypilot-org/skypilot/blob/master/examples/docker/echo_app.yaml), [Cog](https://github.com/skypilot-org/skypilot/blob/master/examples/cog/), [Unsloth](https://github.com/skypilot-org/skypilot/blob/master/examples/unsloth/unsloth.yaml), [Ollama](https://github.com/skypilot-org/skypilot/blob/master/llm/ollama), [llm.c](https://github.com/skypilot-org/skypilot/tree/master/llm/gpt-2), [Airflow](./examples/airflow/training_workflow) and [many more (`examples/`)](./examples).

Case Studies and Integrations: [Community Spotlights](https://blog.skypilot.co/community/)

Expand Down
5 changes: 5 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
# Documentation
Sphinx docs based on ReadTheDocs.

## Styleguide

- Each page's title is in `Title Case <https://en.wikipedia.org/wiki/Title_case>`_.
- Each subsection's title is in `Sentence case <https://en.wikipedia.org/wiki/Sentence_case>`_.

## Build
```bash
pip install -r requirements-docs.txt
Expand Down
18 changes: 18 additions & 0 deletions docs/build.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,23 @@
#!/bin/bash

# Function to check if file exists and is less than 24 hours old
check_file_age() {
if [ -f "$1" ] && [ $(( $(date +%s) - $(stat -f %m "$1" 2>/dev/null || stat -c %Y "$1" 2>/dev/null) )) -lt 86400 ]; then
return 0 # File exists and is recent
fi
return 1 # File doesn't exist or is old
}

# Only run sky show-gpus commands if output files don't exist or are old
if ! check_file_age "source/compute/show-gpus-all.txt"; then
sky show-gpus -a > source/compute/show-gpus-all.txt
sed -i '' '/^tpu-v2-128/,$d' source/compute/show-gpus-all.txt && echo "... [omitted long outputs] ..." >> source/compute/show-gpus-all.txt
fi

if ! check_file_age "source/compute/show-gpus-h100-8.txt"; then
sky show-gpus H100:8 > source/compute/show-gpus-h100-8.txt
fi

rm -rf build docs

# MacOS and GNU `script` have different usages
Expand Down
3 changes: 3 additions & 0 deletions docs/source/_gallery_original/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ Contents
:maxdepth: 1
:caption: LLM Models

DeepSeek-R1 <llms/deepseek-r1>
DeepSeek-R1 Distilled <llms/deepseek-r1-distilled>
Vision Llama 3.2 (Meta) <llms/llama-3_2>
Llama 3.1 (Meta) <llms/llama-3_1>
Llama 3 (Meta) <llms/llama-3>
Expand All @@ -50,6 +52,7 @@ Contents
:maxdepth: 1
:caption: Applications

Image Vector Database <applications/vector_database>
Tabby: Coding Assistant <applications/tabby>
LocalGPT: Chat with PDF <applications/localgpt>

Expand Down
1 change: 1 addition & 0 deletions docs/source/_gallery_original/llms/deepseek-r1.md
5 changes: 2 additions & 3 deletions docs/source/_static/custom.js
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,9 @@ document.addEventListener('DOMContentLoaded', () => {
// New items:
const newItems = [
{ selector: '.toctree-l1 > a', text: 'Many Parallel Jobs' },
{ selector: '.toctree-l1 > a', text: 'Admin Policy Enforcement' },
{ selector: '.toctree-l1 > a', text: 'Using Existing Machines' },
{ selector: '.toctree-l1 > a', text: 'Admin Policies' },
{ selector: '.toctree-l2 > a', text: 'Multiple Kubernetes Clusters' },
{ selector: '.toctree-l1 > a', text: 'HTTPS Encryption' },
{ selector: '.toctree-l2 > a', text: 'HTTPS Encryption' },
];
newItems.forEach(({ selector, text }) => {
document.querySelectorAll(selector).forEach((el) => {
Expand Down
6 changes: 3 additions & 3 deletions docs/source/cloud-setup/cloud-permissions/aws.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ AWS

.. _cloud-permissions-aws-user-creation:

Minimal Permissions
Minimal permissions
-----------------------

Minimizing AWS permissions should be set up in two places:

1. **User Account**: the user account is the individual account of an user created by the administrator.
2. **IAM role**: the IAM role is assigned to all EC2 instances created by SkyPilot, which is used by the instances to access AWS resources, e.g., read/write S3 buckets or create other EC2 nodes. The IAM role is shared by all users under the same organization/root account. (If a user account has the permission to create IAM roles, SkyPilot can automatically create the role.)

User Account
User account
~~~~~~~~~~~~~~~~~~

AWS accounts can be attached with a policy that limits the permissions of the account. Follow these steps to create an AWS user with the minimum permissions required by SkyPilot:
Expand Down Expand Up @@ -195,7 +195,7 @@ With the steps above you are almost ready to have the users in your organization
2. Alternatively, you can create the ``skypilot-v1`` IAM role manually. The following section describes how to create the IAM role manually.


IAM Role Creation
IAM role creation
~~~~~~~~~~~~~~~~~~

.. note::
Expand Down
16 changes: 8 additions & 8 deletions docs/source/cloud-setup/cloud-permissions/gcp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Generally, the administrator can choose among three "levels" of permissions, fro

.. _gcp-medium-permissions:

Medium Permissions
Medium permissions
-----------------------

The easiest way to grant permissions to a user access your GCP project without the ``Owner`` role is to add the following roles to the user principals:
Expand Down Expand Up @@ -41,7 +41,7 @@ You can grant those accesses via GCP's `IAM & Admin console <https://console.clo

.. _gcp-minimal-permissions:

Minimal Permissions
Minimal permissions
-----------------------

The :ref:`Medium Permissions <gcp-medium-permissions>` assigns admin permissions for some GCP services to the user. If you would like to grant finer-grained and more minimal permissions to your users in your organization / project, you can create a custom role by following the steps below:
Expand Down Expand Up @@ -178,7 +178,7 @@ User

.. _gcp-service-account-creation:

Service Account
Service account
~~~~~~~~~~~~~~~~~~~
.. note::

Expand Down Expand Up @@ -210,7 +210,7 @@ Medium Permissions roles as described in the previous sections.

.. _gcp-minimum-firewall-rules:

Firewall Rules
Firewall rules
~~~~~~~~~~~~~~~~~~~

By default, users do not need to set up any special firewall rules to start
Expand Down Expand Up @@ -286,7 +286,7 @@ The custom VPC should contain the :ref:`required firewall rules <gcp-minimum-fir
.. _gcp-use-internal-ips:


Using Internal IPs
Using internal IPs
-----------------------
For security reason, users may only want to use internal IPs for SkyPilot instances.
To do so, you can use SkyPilot's global config file ``~/.sky/config.yaml`` to specify the ``gcp.use_internal_ips`` and ``gcp.ssh_proxy_command`` fields (to see the detailed syntax, see :ref:`config-yaml`):
Expand All @@ -302,7 +302,7 @@ To do so, you can use SkyPilot's global config file ``~/.sky/config.yaml`` to sp
The ``gcp.ssh_proxy_command`` field is optional. If SkyPilot is run on a machine that can directly access the internal IPs of the instances, it can be omitted. Otherwise, it should be set to a command that can be used to proxy SSH connections to the internal IPs of the instances.


Cloud NAT Setup
Cloud NAT setup
~~~~~~~~~~~~~~~~

Instances created with internal IPs only on GCP cannot access public internet by default. To make sure SkyPilot can install the dependencies correctly on the instances,
Expand Down Expand Up @@ -340,8 +340,8 @@ If proxy is not needed, but the regions need to be limited, you can set the ``gc
us-east1: null
Force Enable Exteral IPs
~~~~~~~~~~~~~~~~~~~~~~~~
Force enable external IPs
~~~~~~~~~~~~~~~~~~~~~~~~~

An alternative to setting up cloud NAT for instances that need to access the public internet but are in a VPC and communicated with via their internal IP is to force them to be created with an external IP address.

Expand Down
6 changes: 3 additions & 3 deletions docs/source/cloud-setup/cloud-permissions/kubernetes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Below are the permissions required by SkyPilot and an example service account YA

.. _k8s-permissions:

Minimum Permissions Required for SkyPilot
Minimum permissions required for SkyPilot
-----------------------------------------

SkyPilot requires permissions equivalent to the following roles to be able to manage the resources in the Kubernetes cluster:
Expand Down Expand Up @@ -120,7 +120,7 @@ Permissions for ``sky show-gpus``
If this role is not granted to the service account, ``sky show-gpus`` will still work but it will only show the total GPUs on the nodes, not the number of free GPUs.


Permissions for Object Store Mounting
Permissions for object store mounting
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If your tasks use object store mounting (e.g., S3, GCS, etc.), SkyPilot will need to run a DaemonSet to expose the FUSE device as a Kubernetes resource to SkyPilot pods.
Expand Down Expand Up @@ -177,7 +177,7 @@ If your tasks use :ref:`Ingress <kubernetes-ingress>` for exposing ports, you wi
.. _k8s-sa-example:

Example using Custom Service Account
Example using custom service account
------------------------------------

To create a service account that has all necessary permissions for SkyPilot (including for accessing object stores), you can use the following YAML.
Expand Down
4 changes: 2 additions & 2 deletions docs/source/cloud-setup/cloud-permissions/vsphere.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ This document is provided for users who use VMware vSphere provider and helps th

.. _cloud-prepare-vsphere-tags:

Prepare Category & Tag
Prepare category & tag
~~~~~~~~~~~~~~~~~~~~~~~

The Categories and Tags is needed when using the vSphere provider, please follow bellow steps to create them.
Expand Down Expand Up @@ -79,7 +79,7 @@ The Categories and Tags is needed when using the vSphere provider, please follow

.. _cloud-prepare-vsphere-storage-policy:

Create VM Storage Policies
Create VM storage policies
~~~~~~~~~~~~~~~~~~~~~~~~~~

The vSphere provider depends on the VM Storage Policies to place the VM. A Shared Datastore is recommended.
Expand Down
22 changes: 11 additions & 11 deletions docs/source/cloud-setup/policy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ Example usage:
- :ref:`use-spot-for-gpu-policy`
- :ref:`enforce-autostop-policy`
- :ref:`dynamic-kubernetes-contexts-update-policy`


To implement and use an admin policy:

- Admins writes a simple Python package with a policy class that implements SkyPilot's ``sky.AdminPolicy`` interface;
- Admins writes a simple Python package with a policy class that implements SkyPilot's ``sky.AdminPolicy`` interface;
- Admins distributes this package to users;
- Users simply set the ``admin_policy`` field in the SkyPilot config file ``~/.sky/config.yaml`` for the policy to go into effect.

Expand Down Expand Up @@ -117,7 +117,7 @@ The ``sky.Config`` and ``sky.RequestOptions`` classes are defined as follows:

The ``sky.AdminPolicy`` should be idempotent. In other words, it should be safe to apply the policy multiple times to the same user request.

Example Policies
Example policies
----------------

We have provided a few example policies in `examples/admin_policy/example_policy <https://github.com/skypilot-org/skypilot/tree/master/examples/admin_policy/example_policy>`_. You can test these policies by installing the example policy package in your Python environment.
Expand All @@ -128,8 +128,8 @@ We have provided a few example policies in `examples/admin_policy/example_policy
cd skypilot
pip install examples/admin_policy/example_policy
Reject All
~~~~~~~~~~
Reject all tasks
~~~~~~~~~~~~~~~~

.. literalinclude:: ../../../examples/admin_policy/example_policy/example_policy/skypilot_policy.py
:language: python
Expand All @@ -142,7 +142,7 @@ Reject All

.. _kubernetes-labels-policy:

Add Labels for all Tasks on Kubernetes
Add labels for all tasks on Kubernetes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. literalinclude:: ../../../examples/admin_policy/example_policy/example_policy/skypilot_policy.py
Expand All @@ -156,8 +156,8 @@ Add Labels for all Tasks on Kubernetes


.. _disable-public-ip-policy:
Always Disable Public IP for AWS Tasks

Always disable public IP for AWS tasks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. literalinclude:: ../../../examples/admin_policy/example_policy/example_policy/skypilot_policy.py
Expand All @@ -171,7 +171,7 @@ Always Disable Public IP for AWS Tasks

.. _use-spot-for-gpu-policy:

Use Spot for all GPU Tasks
Use spot for all GPU tasks
~~~~~~~~~~~~~~~~~~~~~~~~~~

..
Expand All @@ -186,7 +186,7 @@ Use Spot for all GPU Tasks

.. _enforce-autostop-policy:

Enforce Autostop for all Tasks
Enforce autostop for all tasks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. literalinclude:: ../../../examples/admin_policy/example_policy/example_policy/skypilot_policy.py
Expand All @@ -201,7 +201,7 @@ Enforce Autostop for all Tasks

.. _dynamic-kubernetes-contexts-update-policy:

Dynamically Update Kubernetes Contexts to Use
Dynamically update Kubernetes contexts to use
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. literalinclude:: ../../../examples/admin_policy/example_policy/example_policy/skypilot_policy.py
Expand Down
25 changes: 25 additions & 0 deletions docs/source/compute/cloud-vm.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
.. _cloud-vm:

Using Cloud VMs
=====================

SkyPilot supports launching cloud instances (virtual machines, or VMs) on all major cloud providers.
You can get started with :ref:`quickstart`.

See :ref:`concept-cloud-vms` for an overview.


.. Administrator Guides
.. ~~~~~~~~~~~~~~~~~~~~~
.. For administrators, the following optional guides may be helpful:
.. The following guides are optional and may be helpful for administrators:
.. - :ref:`cloud-permissions`
.. - :ref:`cloud-auth`
.. - :ref:`quota`
.. - :ref:`cloud-permissions`: Set up specific IAM roles, permissions, or service accounts for SkyPilot to use.
.. - :ref:`cloud-auth`: Guides for different authentication methods for the clouds.
.. - :ref:`quota`: Guides for requesting quota increases.
Loading

0 comments on commit 9fdabac

Please sign in to comment.