title

date

weight

tags

g. Create Ray Cluster

2022-08-19

80

Ray

Cluster

To create ray cluster, we need a .yaml file with the necessary configuration. A complete list of configuration options can be found here.

Copy the following configuration to cluster.yaml:

cluster_name: workshop

# The node config specifies the launch config and physical instance type.
available_node_types:
    ray.head:
        node_config:
            SubnetIds: [SUBNET]
            ImageId: AMI_ID
            IamInstanceProfile:
                Arn: RAY_HEAD_IAM_ROLE_ARN
            InstanceType: c5.2xlarge

    ray.worker.gpu:
        min_workers: 2
        max_workers: 2
        node_config:
            SubnetIds: [SUBNET]
            ImageId: AMI_ID
            IamInstanceProfile:
                Arn: RAY_WORKER_IAM_ROLE_ARN
            InstanceType: g4dn.2xlarge

head_node_type: ray.head
# Cloud-provider specific configuration.
provider:
    type: aws
    region: us-west-2
    # Availability zone(s), comma-separated, that nodes may be launched in.
    availability_zone: us-west-2a
    cache_stopped_nodes: False # If not present, the default is True.
    security_group:
        GroupName: ray-cluster-sg
    cloudwatch:
        agent:
            config: "cloudwatch-agent-config.json"

# How Ray will authenticate with newly launched nodes.
auth:
    ssh_user: ubuntu

# List of shell commands to run to set up nodes.
setup_commands:
    - FSXL_MOUNT_COMMAND

# Command to start ray on the head node. You don't need to change this.
head_start_ray_commands:
    - ray stop
    - ulimit -n 65536; ray start --head --port=6379 --autoscaling-config=~/ray_bootstrap_config.yaml
# Command to start ray on worker nodes. You don't need to change this.
worker_start_ray_commands:
    - ray stop
    - ulimit -n 65536; ray start --address=$RAY_HEAD_IP:6379

PlaceHolder	Replace With
SUBNET	subnet-xxxxxxxxxxxxxxxxx (public subnet for the availability_zone specified in the .yaml file)
RAY_HEAD_IAM_ROLE_ARN	arn:aws:iam::xxxxxxxxxxxx:instance-profile/ray-head
RAY_WORKER_IAM_ROLE_ARN	arn:aws:iam::xxxxxxxxxxxx:instance-profile/ray-worker
FSXL_MOUNT_COMMAND	sudo mount command from FSxL console (see below)

Use the Arns for the instance profiles obtained in the last step of section b.

To get the mount command for FSxL, navigate to the ray-fsx file system we created in section c and click Attach. This will show an information pan. Copy the command from step 3. under Attach instructions and add that to the .yaml file. It looks something like this:

sudo mount -t lustre -o noatime,flock fs-xxxxxxxxxxxxxxxxx.fsx.us-west-2.amazonaws.com@tcp:/xxxxxxxx /fsx

Before we can launch the cluster, we also have to install ray in Cloud9:

pip install boto3 ray[default]

Now we're ready to spin up the cluster:

ray up -y cluster.yaml

The command will exit once the head node is set up. The worker nodes are launched after that. The whole process takes 5-10 min to launch all the nodes.

To the check the status of the cluster, we can log in to the head node using the following command:

ray attach cluster.yaml

And, from inside the head node, execute:

ray status

You would see an output like this:

======== Autoscaler status: 2022-08-27 20:48:19.055954 ========
Node status
---------------------------------------------------------------
Healthy:
 2 ray.worker.gpu
 1 ray.head
Pending:
 (no pending nodes)
Recent failures:
 (no failures)

Resources
---------------------------------------------------------------
Usage:
 0.0/24.0 CPU
 0.0/2.0 GPU
 0.0/2.0 accelerator_type:T4
 0.00/53.336 GiB memory
 0.00/22.405 GiB object_store_memory

Demands:
 (no resource demands)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

07-create-ray-cluster.md

07-create-ray-cluster.md

Files

07-create-ray-cluster.md

Latest commit

History

07-create-ray-cluster.md

File metadata and controls