diff --git a/ray_cluster_launchers/Readme.md b/ray_cluster_launchers/Readme.md
new file mode 100644
index 0000000..4e7ad73
--- /dev/null
+++ b/ray_cluster_launchers/Readme.md
@@ -0,0 +1,155 @@
+# Instruction of Launching Ray cluster on AWS, Azure, and GCP
+
+
+
+## Preparation - install Ray CLI
+Please use pip to intall the ray CLI on local environment
+```
+# install ray
+pip install -U ray[default]
+```
+
+
+
+
+
+
+
+## Configure Ray Cluster laucher .yml files for AWS, Azure, and GCP
+
+All launcher template .yaml files are modified and based on Ray offical cluster config files:
+
+[aws-example-full.yaml](https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/aws/example-full.yaml), [azure-example-full.yaml](https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/azure/example-full.yaml), and [gcp-example-full.yaml](https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/gcp/example-full.yaml)
+
+
+
+### A. Configure Ray Cluster on AWS at Emory
+
+
+1. Install and Configure [Emory TKI CLI](https://it.emory.edu/tki/)
+
+2. Go to AWS Console and login
+
+3. Go to `EC2` > `Security Group` and create a security group for ray cluster and set `GroupName` at [line 50](./aws-ray-cluster-launcher-template.yaml#L50)
+
+4. Go to `EC2` > `Key Pairs` and create key pair for ray cluster and set `keyName` at [line 59](./aws-ray-cluster-launcher-template.yaml#L59), [line 84](./aws-ray-cluster-launcher-template.yaml#L84) and [line 118](./aws-ray-cluster-launcher-template.yaml#L118).
+
+5. Go to `VPC` > `Subnets` and create subnet for cluster and set `SubnetIds` for ray header and worker nodes at [line 77](./aws-ray-cluster-launcher-template.yaml#L77) and [line 111](./aws-ray-cluster-launcher-template.yaml#L111)
+set subnet
+
+6. login AWS CLI
+
+### B. Configure Ray Cluster on Azure
+
+1. Install and Configure [the Azure CLI](https://cloud.google.com/sdk/docs/install)
+
+ ```
+ # Install azure cli and bundle.
+ pip install azure-cli azure-identity azure-mgmt azure-mgmt-network
+
+ # Login to azure. This will redirect you to your web browser.
+ az login
+ ```
+
+
+2. Use `ssh-keygen -f -t rsa -b 4096` to generate a new ssh key pair for ray cluster laucher VM. Azure ray cluster laucher will use the key to control header and worker nodes later.
+ ```
+ # generate the ssh key pair.
+ ssh-keygen -f -t rsa -b 4096
+
+ ```
+
+
+3. Modify and Configure Ray cluster launcher file for Azure
+ - On [line 64, and 66](./azure-ray-cluster-launcher-template.yaml#L64), point to the ssh key that you generate on your local path.
+ - On [line 119](./azure-ray-cluster-launcher-template.yaml#L119), mount the ssh public key to VMs.
+
+
+
+### C. Configure Ray Cluster on GCP
+
+1. Login and create GCP project and get \ on GCP Console. User need to modify `project_id` by using user's project If on [line 42](./gcp-ray-cluster-launcher-template.yaml#L42).
+
+
+
+2. Go to **APIs and Services** panel to Enable the following APIs on GCP Console:
+ - Cloud Resource Manager API
+ - Compute Engine API
+ - Cloud OS Login API
+ - Identity and Access Management (IAM) API
+
+
+
+3. Generate a ssh key for your gcp project:
+ ```
+ ssh-keygen -t rsa -f -C -b 2048
+ ```
+
+
+
+4. Go to **Metadata** panel and click **SSH KEYS** tab to upload the public ssh key on GCP project. All instances in the project inherit these SSH keys.
+
+
+
+5. Modify `ssh_private_key` to point the ssh private key on [line 59](./gcp-ray-cluster-launcher-template.yaml#L59). Set `KeyName` in the head and worker node on [line 77](./gcp-ray-cluster-launcher-template.yaml#L77) and [line 113](./gcp-ray-cluster-launcher-template.yaml#L113).
+
+
+
+6. Install and Configure [the gcloud CLI](https://cloud.google.com/sdk/docs/install)
+ ```
+ # install pre-requisites
+ apt-get install apt-transport-https ca-certificates gnupg curl
+
+ # install gcp cli
+ apt-get install google-cloud-cli
+
+ # inital and config gcp
+ gcloud init
+
+ ```
+
+
+
+GCP References:
+[How to add SSH keys to VMs](https://cloud.google.com/compute/docs/connect/add-ssh-keys#:~:text=existing%20SSH%20keys-,To%20add%20a%20public%20SSH%20key%20to,metadata%2C%20use%20the%20google_compute_project_metadata%20resource.&text=AAAAC3NzaC1lZDI1NTE5AAAAILg6UtHDNyMNAh0GjaytsJdrUxjtLy3APXqZfNZhvCeT%20test%20EOF%20%7D%20%7D-,If%20there%20are%20existing%20SSH%20keys%20in%20project%20metadata%2C%20you,the%20the%20Compute%20Engine%20API.) (step 5)
+
+
+
+
+
+
+
+## Start and Test Ray with the Ray cluster launcher
+It works by running the following commands from your local machine:
+```
+# Create or update the cluster
+ray up .yaml
+
+# Get a remote screen on the head node.
+ray attach .yaml
+
+# Try running a Ray program.
+python -c 'import ray; ray.init()'
+exit
+
+# Tear down the cluster.
+ray down .yaml
+```
+
+![Test screenshot](./images/test_screenshot.png)
+
+**After Ray cluster up successfully, users should be able to check the running ray clusters on different platform console.**
+
+**For AWS at Emory:**
+![AWS screenshot](./images/aws_instances.png)
+
+
+
+
+**For Azure portal:**
+![azure screenshot](./images/azure_portal.png)
+
+
+
+**For GCP Console:**
+![GCP screenshot](./images/gcp_vms.png)
diff --git a/ray_cluster_launchers/aws-ray-cluster-launcher-template.yaml b/ray_cluster_launchers/aws-ray-cluster-launcher-template.yaml
new file mode 100644
index 0000000..3773a84
--- /dev/null
+++ b/ray_cluster_launchers/aws-ray-cluster-launcher-template.yaml
@@ -0,0 +1,199 @@
+# An unique identifier for the head node and workers of this cluster.
+cluster_name: aws-ray-cluster
+
+# The maximum number of workers nodes to launch in addition to the head
+# node.
+max_workers: 2
+
+# The autoscaler will scale up the cluster faster with higher upscaling speed.
+# E.g., if the task requires adding more nodes then autoscaler will gradually
+# scale up the cluster in chunks of upscaling_speed*currently_running_nodes.
+# This number should be > 0.
+upscaling_speed: 1.0
+
+# This executes all commands on all nodes in the docker container,
+# and opens all the necessary ports to support the Ray cluster.
+# Empty string means disabled.
+docker:
+ image: "rayproject/ray-ml:latest-gpu" # You can change this to latest-cpu if you don't need GPU support and want a faster startup
+ # image: rayproject/ray:latest-cpu # use this one if you don't need ML dependencies, it's faster to pull
+ container_name: "ray_container"
+ # If true, pulls latest version of image. Otherwise, `docker run` will only pull the image
+ # if no cached version is present.
+ pull_before_run: True
+ run_options: # Extra options to pass into "docker run"
+ - --ulimit nofile=65536:65536
+
+ # Example of running a GPU head with CPU workers
+ # head_image: "rayproject/ray-ml:latest-gpu"
+ # Allow Ray to automatically detect GPUs
+
+ # worker_image: "rayproject/ray-ml:latest-cpu"
+ # worker_run_options: []
+
+# If a node is idle for this many minutes, it will be removed.
+idle_timeout_minutes: 5
+
+# Cloud-provider specific configuration.
+provider:
+ type: aws
+ region: us-east-1
+ # Availability zone(s), comma-separated, that nodes may be launched in.
+ # Nodes will be launched in the first listed availability zone and will
+ # be tried in the subsequent availability zones if launching fails.
+ # availability_zone: us-east-1a,us-east-1b
+ # Whether to allow node reuse. If set to False, nodes will be terminated
+ # instead of stopped.
+ cache_stopped_nodes: False # If not present, the default is True.
+ use_internal_ips: True
+ security_group:
+ GroupName:
+
+
+# How Ray will authenticate with newly launched nodes.
+auth:
+ ssh_user:
+# By default Ray creates a new private keypair, but you can also use your own.
+# If you do so, make sure to also set "KeyName" in the head and worker node
+# configurations below.
+ ssh_private_key:
+
+# Tell the autoscaler the allowed node types and the resources they provide.
+# The key is the name of the node type, which is just for debugging purposes.
+# The node config specifies the launch config and physical instance type.
+available_node_types:
+ head_node:
+ # The node type's CPU and GPU resources are auto-detected based on AWS instance type.
+ # If desired, you can override the autodetected CPU and GPU resources advertised to the autoscaler.
+ # You can also set custom resources.
+ # For example, to mark a node type as having 1 CPU, 1 GPU, and 5 units of a resource called "custom", set
+ # resources: {"CPU": 1, "GPU": 1, "custom": 5}
+ # resources: {}
+ # Provider-specific config for this node type, e.g. instance type. By default
+ # Ray will auto-configure unspecified fields such as SubnetId and KeyName.
+ # For more documentation on available fields, see:
+ # http://boto3.readthedocs.io/en/latest/reference/services/ec2.html#EC2.ServiceResource.create_instances
+ node_config:
+ SubnetIds:
+ -
+ InstanceType: m5.large
+ # Default AMI for us-west-2.
+ # Check https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/_private/aws/config.py
+ # for default images for other zones.
+ ImageId: ami-07caf09b362be10b8
+ KeyName:
+ # SecurityGroups: [public-ecg-group]
+ # You can provision additional disk space with a conf as follows
+ BlockDeviceMappings:
+ - DeviceName: /dev/xvda
+ Ebs:
+ VolumeSize: 150
+ VolumeType: gp3
+ # Additional options in the boto docs.
+ worker_nodes:
+ # The minimum number of worker nodes of this type to launch.
+ # This number should be >= 0.
+ min_workers: 1
+ # The maximum number of worker nodes of this type to launch.
+ # This takes precedence over min_workers.
+ max_workers: 2
+ # The node type's CPU and GPU resources are auto-detected based on AWS instance type.
+ # If desired, you can override the autodetected CPU and GPU resources advertised to the autoscaler.
+ # You can also set custom resources.
+ # For example, to mark a node type as having 1 CPU, 1 GPU, and 5 units of a resource called "custom", set
+ # resources: {"CPU": 1, "GPU": 1, "custom": 5}
+ # resources: {}
+ # Provider-specific config for this node type, e.g. instance type. By default
+ # Ray will auto-configure unspecified fields such as SubnetId and KeyName.
+ # For more documentation on available fields, see:
+ # http://boto3.readthedocs.io/en/latest/reference/services/ec2.html#EC2.ServiceResource.create_instances
+ node_config:
+ SubnetIds:
+ -
+ InstanceType: m5.large
+ # Default AMI for us-west-2.
+ # Check https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/_private/aws/config.py
+ # for default images for other zones.
+ ImageId: ami-07caf09b362be10b8
+ KeyName:
+ # SecurityGroups: [public-ecg-group]
+ # - public-ecg-group
+ # Run workers on spot by default. Comment this out to use on-demand.
+ # NOTE: If relying on spot instances, it is best to specify multiple different instance
+ # types to avoid interruption when one instance type is experiencing heightened demand.
+ # Demand information can be found at https://aws.amazon.com/ec2/spot/instance-advisor/
+ BlockDeviceMappings:
+ - DeviceName: /dev/xvda
+ Ebs:
+ VolumeSize: 150
+ VolumeType: gp3
+ # InstanceMarketOptions:
+ # MarketType: spot
+ # Additional options can be found in the boto docs, e.g.
+ # SpotOptions:
+ # MaxPrice: MAX_HOURLY_PRICE
+ # Additional options in the boto docs.
+
+# Specify the node type of the head node (as configured above).
+head_node_type: head_node
+
+# Files or directories to copy to the head and worker nodes. The format is a
+# dictionary from REMOTE_PATH: LOCAL_PATH, e.g.
+file_mounts: {
+# "/path1/on/remote/machine": "/path1/on/local/machine",
+# "/path2/on/remote/machine": "/path2/on/local/machine",
+}
+
+# Files or directories to copy from the head node to the worker nodes. The format is a
+# list of paths. The same path on the head node will be copied to the worker node.
+# This behavior is a subset of the file_mounts behavior. In the vast majority of cases
+# you should just use file_mounts. Only use this if you know what you're doing!
+cluster_synced_files: []
+
+# Whether changes to directories in file_mounts or cluster_synced_files in the head node
+# should sync to the worker node continuously
+file_mounts_sync_continuously: False
+
+# Patterns for files to exclude when running rsync up or rsync down
+rsync_exclude:
+ - "**/.git"
+ - "**/.git/**"
+
+# Pattern files to use for filtering out files when running rsync up or rsync down. The file is searched for
+# in the source directory and recursively through all subdirectories. For example, if .gitignore is provided
+# as a value, the behavior will match git's behavior for finding and using .gitignore files.
+rsync_filter:
+ - ".gitignore"
+
+# List of commands that will be run before `setup_commands`. If docker is
+# enabled, these commands will run outside the container and before docker
+# is setup.
+initialization_commands: []
+
+# List of shell commands to run to set up nodes.
+setup_commands:
+ - sleep 4
+ - sudo yum install -y python3-pip python-is-python3
+ - pip3 install ray[default] boto3 torch
+ # Note: if you're developing Ray, you probably want to create a Docker image that
+ # has your Ray repo pre-cloned. Then, you can replace the pip installs
+ # below with a git checkout (and possibly a recompile).
+ # To run the nightly version of ray (as opposed to the latest), either use a rayproject docker image
+ # that has the "nightly" (e.g. "rayproject/ray-ml:nightly-gpu") or uncomment the following line:
+ # - pip install -U "ray[default] @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp37-cp37m-manylinux2014_x86_64.whl"
+
+# Custom commands that will be run on the head node after common setup.
+head_setup_commands: []
+
+# Custom commands that will be run on worker nodes after common setup.
+worker_setup_commands: []
+
+# Command to start ray on the head node. You don't need to change this.
+head_start_ray_commands:
+ - ray stop
+ - ray start --head --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml --dashboard-host=0.0.0.0
+
+# Command to start ray on worker nodes. You don't need to change this.
+worker_start_ray_commands:
+ - ray stop
+ - ray start --address=$RAY_HEAD_IP:6379 --object-manager-port=8076
diff --git a/ray_cluster_launchers/azure-ray-cluster-launcher-template.yaml b/ray_cluster_launchers/azure-ray-cluster-launcher-template.yaml
new file mode 100644
index 0000000..d64f3d4
--- /dev/null
+++ b/ray_cluster_launchers/azure-ray-cluster-launcher-template.yaml
@@ -0,0 +1,182 @@
+# An unique identifier for the head node and workers of this cluster.
+cluster_name: default
+
+# The maximum number of workers nodes to launch in addition to the head
+# node.
+max_workers: 2
+
+# The autoscaler will scale up the cluster faster with higher upscaling speed.
+# E.g., if the task requires adding more nodes then autoscaler will gradually
+# scale up the cluster in chunks of upscaling_speed*currently_running_nodes.
+# This number should be > 0.
+upscaling_speed: 1.0
+
+# This executes all commands on all nodes in the docker container,
+# and opens all the necessary ports to support the Ray cluster.
+# Empty object means disabled.
+docker:
+ image: "rayproject/ray-ml:latest-gpu" # You can change this to latest-cpu if you don't need GPU support and want a faster startup
+ # image: rayproject/ray:latest-gpu # use this one if you don't need ML dependencies, it's faster to pull
+ container_name: "ray_container"
+ # If true, pulls latest version of image. Otherwise, `docker run` will only pull the image
+ # if no cached version is present.
+ pull_before_run: True
+ run_options: # Extra options to pass into "docker run"
+ - --ulimit nofile=65536:65536
+
+ # Example of running a GPU head with CPU workers
+ # head_image: "rayproject/ray-ml:latest-gpu"
+ # Allow Ray to automatically detect GPUs
+
+ # worker_image: "rayproject/ray-ml:latest-cpu"
+ # worker_run_options: []
+
+# If a node is idle for this many minutes, it will be removed.
+idle_timeout_minutes: 5
+
+# Cloud-provider specific configuration.
+provider:
+ type: azure
+ # https://azure.microsoft.com/en-us/global-infrastructure/locations
+ location: westus2
+ resource_group: ray-cluster
+ # set subscription id otherwise the default from az cli will be used
+ # subscription_id: 00000000-0000-0000-0000-000000000000
+ # set unique subnet mask or a random mask will be used
+ # subnet_mask: 10.0.0.0/16
+ # set unique id for resources in this cluster
+ # if not set a default id will be generated based on the resource group and cluster name
+ # unique_id: RAY1
+ # set managed identity name and resource group
+ # if not set, a default user-assigned identity will be generated in the resource group specified above
+ # msi_name: ray-cluster-msi
+ # msi_resource_group: other-rg
+ # Set provisioning and use of public/private IPs for head and worker nodes. If both options below are true,
+ # only the head node will have a public IP address provisioned.
+ # use_internal_ips: True
+ # use_external_head_ip: True
+
+# How Ray will authenticate with newly launched nodes.
+auth:
+ ssh_user: ubuntu
+ # you must specify paths to matching private and public key pair files
+ # use `ssh-keygen -t rsa -b 4096` to generate a new ssh key pair
+ ssh_private_key:
+ # changes to this should match what is specified in file_mounts
+ ssh_public_key:
+
+# More specific customization to node configurations can be made using the ARM template azure-vm-template.json file
+# See documentation here: https://docs.microsoft.com/en-us/azure/templates/microsoft.compute/2019-03-01/virtualmachines
+# Changes to the local file will be used during deployment of the head node, however worker nodes deployment occurs
+# on the head node, so changes to the template must be included in the wheel file used in setup_commands section below
+
+# Tell the autoscaler the allowed node types and the resources they provide.
+# The key is the name of the node type, which is just for debugging purposes.
+# The node config specifies the launch config and physical instance type.
+available_node_types:
+ ray.head.default:
+ # The resources provided by this node type.
+ resources: {"CPU": 2}
+ # Provider-specific config, e.g. instance type.
+ node_config:
+ azure_arm_parameters:
+ vmSize: Standard_D2s_v3
+ # List images https://docs.microsoft.com/en-us/azure/virtual-machines/linux/cli-ps-findimage
+ imagePublisher: microsoft-dsvm
+ imageOffer: ubuntu-1804
+ imageSku: 1804-gen2
+ imageVersion: latest
+
+ ray.worker.default:
+ # The minimum number of worker nodes of this type to launch.
+ # This number should be >= 0.
+ min_workers: 0
+ # The maximum number of worker nodes of this type to launch.
+ # This takes precedence over min_workers.
+ max_workers: 2
+ # The resources provided by this node type.
+ resources: {"CPU": 2}
+ # Provider-specific config, e.g. instance type.
+ node_config:
+ azure_arm_parameters:
+ vmSize: Standard_D2s_v3
+ # List images https://docs.microsoft.com/en-us/azure/virtual-machines/linux/cli-ps-findimage
+ imagePublisher: microsoft-dsvm
+ imageOffer: ubuntu-1804
+ imageSku: 1804-gen2
+ imageVersion: latest
+ # optionally set priority to use Spot instances
+ priority: Spot
+ # set a maximum price for spot instances if desired
+ # billingProfile:
+ # maxPrice: -1
+
+# Specify the node type of the head node (as configured above).
+head_node_type: ray.head.default
+
+# Files or directories to copy to the head and worker nodes. The format is a
+# dictionary from REMOTE_PATH: LOCAL_PATH, e.g.
+file_mounts: {
+# "/path1/on/remote/machine": "/path1/on/local/machine",
+# "/path2/on/remote/machine": "/path2/on/local/machine",
+ "" : ""
+}
+
+# Files or directories to copy from the head node to the worker nodes. The format is a
+# list of paths. The same path on the head node will be copied to the worker node.
+# This behavior is a subset of the file_mounts behavior. In the vast majority of cases
+# you should just use file_mounts. Only use this if you know what you're doing!
+cluster_synced_files: []
+
+# Whether changes to directories in file_mounts or cluster_synced_files in the head node
+# should sync to the worker node continuously
+file_mounts_sync_continuously: False
+
+# Patterns for files to exclude when running rsync up or rsync down
+rsync_exclude:
+ - "**/.git"
+ - "**/.git/**"
+
+# Pattern files to use for filtering out files when running rsync up or rsync down. The file is searched for
+# in the source directory and recursively through all subdirectories. For example, if .gitignore is provided
+# as a value, the behavior will match git's behavior for finding and using .gitignore files.
+rsync_filter:
+ - ".gitignore"
+
+# List of commands that will be run before `setup_commands`. If docker is
+# enabled, these commands will run outside the container and before docker
+# is setup.
+initialization_commands:
+ # enable docker setup
+ - sudo usermod -aG docker $USER || true
+ - sleep 10 # delay to avoid docker permission denied errors
+ # get rid of annoying Ubuntu message
+ - touch ~/.sudo_as_admin_successful
+
+# List of shell commands to run to set up nodes.
+# NOTE: rayproject/ray-ml:latest has ray latest bundled
+setup_commands: []
+ # Note: if you're developing Ray, you probably want to create a Docker image that
+ # has your Ray repo pre-cloned. Then, you can replace the pip installs
+ # below with a git checkout (and possibly a recompile).
+ # To run the nightly version of ray (as opposed to the latest), either use a rayproject docker image
+ # that has the "nightly" (e.g. "rayproject/ray-ml:nightly-gpu") or uncomment the following line:
+ # - pip install -U "ray[default] @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp38-cp38-manylinux2014_x86_64.whl"
+
+# Custom commands that will be run on the head node after common setup.
+# NOTE: rayproject/ray-ml:latest has azure packages bundled
+head_setup_commands: []
+ # - pip install -U azure-cli-core==2.22.0 azure-mgmt-compute==14.0.0 azure-mgmt-msi==1.0.0 azure-mgmt-network==10.2.0 azure-mgmt-resource==13.0.0
+
+# Custom commands that will be run on worker nodes after common setup.
+worker_setup_commands: []
+
+# Command to start ray on the head node. You don't need to change this.
+head_start_ray_commands:
+ - ray stop
+ - ray start --head --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml
+
+# Command to start ray on worker nodes. You don't need to change this.
+worker_start_ray_commands:
+ - ray stop
+ - ray start --address=$RAY_HEAD_IP:6379 --object-manager-port=8076
diff --git a/ray_cluster_launchers/gcp-ray-cluster-launcher-template.yaml b/ray_cluster_launchers/gcp-ray-cluster-launcher-template.yaml
new file mode 100644
index 0000000..4c21ee5
--- /dev/null
+++ b/ray_cluster_launchers/gcp-ray-cluster-launcher-template.yaml
@@ -0,0 +1,205 @@
+# An unique identifier for the head node and workers of this cluster.
+cluster_name: gcp-ray-cluster
+
+# The maximum number of workers nodes to launch in addition to the head
+# node.
+max_workers: 2
+
+# The autoscaler will scale up the cluster faster with higher upscaling speed.
+# E.g., if the task requires adding more nodes then autoscaler will gradually
+# scale up the cluster in chunks of upscaling_speed*currently_running_nodes.
+# This number should be > 0.
+upscaling_speed: 1.0
+
+# This executes all commands on all nodes in the docker container,
+# and opens all the necessary ports to support the Ray cluster.
+# Empty string means disabled.
+docker:
+ image: "rayproject/ray-ml:latest-gpu" # You can change this to latest-cpu if you don't need GPU support and want a faster startup
+ # image: rayproject/ray:latest-gpu # use this one if you don't need ML dependencies, it's faster to pull
+ container_name: "ray_container"
+ # If true, pulls latest version of image. Otherwise, `docker run` will only pull the image
+ # if no cached version is present.
+ pull_before_run: True
+ run_options: # Extra options to pass into "docker run"
+ - --ulimit nofile=65536:65536
+
+ # Example of running a GPU head with CPU workers
+ # head_image: "rayproject/ray-ml:latest-gpu"
+ # Allow Ray to automatically detect GPUs
+
+ # worker_image: "rayproject/ray-ml:latest-cpu"
+ # worker_run_options: []
+
+# If a node is idle for this many minutes, it will be removed.
+idle_timeout_minutes: 5
+
+# Cloud-provider specific configuration.
+provider:
+ type: gcp
+ region: us-west1
+ availability_zone: us-west1-a
+ project_id: # Globally unique project id
+
+# How Ray will authenticate with newly launched nodes.
+
+###############################################################
+#
+# 1. need to enable the following gcp services & APIs
+# - Cloud Resource Manager API
+# - Compute Engine API
+# - Cloud OS Login API
+# - Identity and Access Management (IAM) API
+#
+# 2. use `ssh-keygen -t rsa -f -C -b 2048` to generate a new ssh key pair
+#
+###############################################################
+auth:
+ ssh_user:
+ ssh_private_key:
+# If you do so, make sure to also set "KeyName" in the head and worker node
+# configurations below. This requires that you have added the key into the
+# project wide meta-data.
+# ssh_private_key: /path/to/your/key.pem
+
+# Tell the autoscaler the allowed node types and the resources they provide.
+# The key is the name of the node type, which is just for debugging purposes.
+# The node config specifies the launch config and physical instance type.
+available_node_types:
+ ray_head_default:
+ # The resources provided by this node type.
+ resources: {"CPU": 2}
+ # Provider-specific config for the head node, e.g. instance type. By default
+ # Ray will auto-configure unspecified fields such as subnets and ssh-keys.
+ # For more documentation on available fields, see:
+ # https://cloud.google.com/compute/docs/reference/rest/v1/instances/insert
+ node_config:
+ KeyName:
+ machineType: n1-standard-2
+ disks:
+ - boot: true
+ autoDelete: true
+ type: PERSISTENT
+ initializeParams:
+ diskSizeGb: 50
+ # See https://cloud.google.com/compute/docs/images for more images
+ sourceImage: projects/deeplearning-platform-release/global/images/family/common-cpu
+
+ # Additional options can be found in in the compute docs at
+ # https://cloud.google.com/compute/docs/reference/rest/v1/instances/insert
+
+ # If the network interface is specified as below in both head and worker
+ # nodes, the manual network config is used. Otherwise an existing subnet is
+ # used. To use a shared subnet, ask the subnet owner to grant permission
+ # for 'compute.subnetworks.use' to the ray autoscaler account...
+ # networkInterfaces:
+ # - kind: compute#networkInterface
+ # subnetwork: path/to/subnet
+ # aliasIpRanges: []
+ ray_worker_small:
+ # The minimum number of worker nodes of this type to launch.
+ # This number should be >= 0.
+ min_workers: 1
+ # The maximum number of worker nodes of this type to launch.
+ # This takes precedence over min_workers.
+ max_workers: 2
+ # The resources provided by this node type.
+ resources: {"CPU": 2}
+ # Provider-specific config for the head node, e.g. instance type. By default
+ # Ray will auto-configure unspecified fields such as subnets and ssh-keys.
+ # For more documentation on available fields, see:
+ # https://cloud.google.com/compute/docs/reference/rest/v1/instances/insert
+ node_config:
+ KeyName:
+ machineType: n1-standard-2
+ disks:
+ - boot: true
+ autoDelete: true
+ type: PERSISTENT
+ initializeParams:
+ diskSizeGb: 50
+ # See https://cloud.google.com/compute/docs/images for more images
+ sourceImage: projects/deeplearning-platform-release/global/images/family/common-cpu
+ # Run workers on preemtible instance by default.
+ # Comment this out to use on-demand.
+ scheduling:
+ - preemptible: true
+ # Un-Comment this to launch workers with the Service Account of the Head Node
+ # serviceAccounts:
+ # - email: ray-autoscaler-sa-v1@.iam.gserviceaccount.com
+ # scopes:
+ # - https://www.googleapis.com/auth/cloud-platform
+
+ # Additional options can be found in in the compute docs at
+ # https://cloud.google.com/compute/docs/reference/rest/v1/instances/insert
+
+# Specify the node type of the head node (as configured above).
+head_node_type: ray_head_default
+
+# Files or directories to copy to the head and worker nodes. The format is a
+# dictionary from REMOTE_PATH: LOCAL_PATH, e.g.
+file_mounts: {
+# "/path1/on/remote/machine": "/path1/on/local/machine",
+# "/path2/on/remote/machine": "/path2/on/local/machine",
+}
+
+# Files or directories to copy from the head node to the worker nodes. The format is a
+# list of paths. The same path on the head node will be copied to the worker node.
+# This behavior is a subset of the file_mounts behavior. In the vast majority of cases
+# you should just use file_mounts. Only use this if you know what you're doing!
+cluster_synced_files: []
+
+# Whether changes to directories in file_mounts or cluster_synced_files in the head node
+# should sync to the worker node continuously
+file_mounts_sync_continuously: False
+
+# Patterns for files to exclude when running rsync up or rsync down
+rsync_exclude:
+ - "**/.git"
+ - "**/.git/**"
+
+# Pattern files to use for filtering out files when running rsync up or rsync down. The file is searched for
+# in the source directory and recursively through all subdirectories. For example, if .gitignore is provided
+# as a value, the behavior will match git's behavior for finding and using .gitignore files.
+rsync_filter:
+ - ".gitignore"
+
+# List of commands that will be run before `setup_commands`. If docker is
+# enabled, these commands will run outside the container and before docker
+# is setup.
+initialization_commands: []
+
+# List of shell commands to run to set up nodes.
+setup_commands: []
+ # Note: if you're developing Ray, you probably want to create a Docker image that
+ # has your Ray repo pre-cloned. Then, you can replace the pip installs
+ # below with a git checkout (and possibly a recompile).
+ # To run the nightly version of ray (as opposed to the latest), either use a rayproject docker image
+ # that has the "nightly" (e.g. "rayproject/ray-ml:nightly-gpu") or uncomment the following line:
+ # - pip install -U "ray[default] @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp37-cp37m-manylinux2014_x86_64.whl"
+
+
+# Custom commands that will be run on the head node after common setup.
+head_setup_commands:
+ - pip install google-api-python-client==1.7.8
+
+# Custom commands that will be run on worker nodes after common setup.
+worker_setup_commands: []
+
+# Command to start ray on the head node. You don't need to change this.
+head_start_ray_commands:
+ - ray stop
+ - >-
+ ray start
+ --head
+ --port=6379
+ --object-manager-port=8076
+ --autoscaling-config=~/ray_bootstrap_config.yaml
+
+# Command to start ray on worker nodes. You don't need to change this.
+worker_start_ray_commands:
+ - ray stop
+ - >-
+ ray start
+ --address=$RAY_HEAD_IP:6379
+ --object-manager-port=8076
diff --git a/ray_cluster_launchers/images/aws_instances.png b/ray_cluster_launchers/images/aws_instances.png
new file mode 100644
index 0000000..790b869
Binary files /dev/null and b/ray_cluster_launchers/images/aws_instances.png differ
diff --git a/ray_cluster_launchers/images/azure_portal.png b/ray_cluster_launchers/images/azure_portal.png
new file mode 100644
index 0000000..146e5cc
Binary files /dev/null and b/ray_cluster_launchers/images/azure_portal.png differ
diff --git a/ray_cluster_launchers/images/gcp_vms.png b/ray_cluster_launchers/images/gcp_vms.png
new file mode 100644
index 0000000..03cc644
Binary files /dev/null and b/ray_cluster_launchers/images/gcp_vms.png differ
diff --git a/ray_cluster_launchers/images/test_screenshot.png b/ray_cluster_launchers/images/test_screenshot.png
new file mode 100644
index 0000000..edb26a1
Binary files /dev/null and b/ray_cluster_launchers/images/test_screenshot.png differ