diff --git a/docs/docs/concepts/fleets.md b/docs/docs/concepts/fleets.md
index 225cfb777..336a48370 100644
--- a/docs/docs/concepts/fleets.md
+++ b/docs/docs/concepts/fleets.md
@@ -38,23 +38,19 @@ Define a fleet configuration as a YAML file in your project directory. The file
-#### Placement
+#### Placement { #cloud-placement }
To ensure instances are interconnected (e.g., for
[distributed tasks](tasks.md#distributed-tasks)), set `placement` to `cluster`.
This ensures all instances are provisioned in the same backend and region with optimal inter-node connectivity
??? info "AWS"
- `dstack` automatically enables [Elastic Fabric Adapter :material-arrow-top-right-thin:{ .external }](https://aws.amazon.com/hpc/efa/){:target="_blank"}
- for the instance types that support it:
- `p5.48xlarge`, `p4d.24xlarge`, `g4dn.12xlarge`, `g4dn.16xlarge`, `g4dn.8xlarge`, `g4dn.metal`,
- `g5.12xlarge`, `g5.16xlarge`, `g5.24xlarge`, `g5.48xlarge`, `g5.8xlarge`, `g6.12xlarge`,
- `g6.16xlarge`, `g6.24xlarge`, `g6.48xlarge`, `g6.8xlarge`, and `gr6.8xlarge`.
-
+ `dstack` automatically enables the Elastic Fabric Adapter for all
+ [EFA-capable instance types :material-arrow-top-right-thin:{ .external }](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html#efa-instance-types){:target="_blank"}.
Currently, only one EFA interface is enabled per instance, regardless of its maximum capacity.
This will change once [this issue :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/issues/1804){:target="_blank"} is resolved.
-> The `cluster` placement is supported only for `aws`, `azure`, `gcp`, and `oci`
+> The `cluster` placement is supported only for `aws`, `azure`, `gcp`, `oci`, and `vultr`
> backends.
#### Resources
@@ -245,7 +241,7 @@ Define a fleet configuration as a YAML file in your project directory. The file
3. The user specified should have passwordless `sudo` access.
-#### Placement
+#### Placement { #ssh-placement }
If the hosts are interconnected (i.e. share the same network), set `placement` to `cluster`.
This is required if you'd like to use the fleet for [distributed tasks](tasks.md#distributed-tasks).
diff --git a/docs/docs/concepts/tasks.md b/docs/docs/concepts/tasks.md
index 16c612b2b..f3ef35230 100644
--- a/docs/docs/concepts/tasks.md
+++ b/docs/docs/concepts/tasks.md
@@ -71,7 +71,7 @@ application.
By default, a task runs on a single node.
However, you can run it on a cluster of nodes by specifying `nodes`.
-
+
```yaml
type: task
@@ -81,33 +81,59 @@ name: train-distrib
# The size of the cluster
nodes: 2
-python: "3.10"
+python: "3.12"
-# Commands of the task
+# Commands to run on each node
commands:
+ - git clone https://github.com/pytorch/examples.git
+ - cd examples/distributed/ddp-tutorial-series
- pip install -r requirements.txt
- torchrun
- --nproc_per_node=$DSTACK_GPUS_PER_NODE
- --node_rank=$DSTACK_NODE_RANK
+ --nproc-per-node=$DSTACK_GPUS_PER_NODE
+ --node-rank=$DSTACK_NODE_RANK
--nnodes=$DSTACK_NODES_NUM
- --master_addr=$DSTACK_MASTER_NODE_IP
- --master_port=8008 resnet_ddp.py
- --num_epochs 20
+ --master-addr=$DSTACK_MASTER_NODE_IP
+ --master-port=12345
+ multinode.py 50 10
resources:
gpu: 24GB
+ # Uncomment if using multiple GPUs
+ #shm_size: 24GB
```
-All you need to do is pass the corresponding environment variables such as
-`DSTACK_GPUS_PER_NODE`, `DSTACK_NODE_RANK`, `DSTACK_NODES_NUM`,
-`DSTACK_MASTER_NODE_IP`, and `DSTACK_GPUS_NUM` (see [System environment variables](#system-environment-variables)).
+Nodes can communicate using their private IP addresses.
+Use `DSTACK_MASTER_NODE_IP`, `$DSTACK_NODE_RANK`, and other
+[System environment variables](#system-environment-variables)
+to discover IP addresses and other details.
+
+??? info "Network interface"
+ Distributed frameworks usually detect the correct network interface automatically,
+ but sometimes you need to specify it explicitly.
+
+ For example, with PyTorch and the NCCL backend, you may need
+ to add these commands to tell NCCL to use the private interface:
+
+ ```yaml
+ commands:
+ - apt-get install -y iproute2
+ - >
+ if [[ $DSTACK_NODE_RANK == 0 ]]; then
+ export NCCL_SOCKET_IFNAME=$(ip -4 -o addr show | fgrep $DSTACK_MASTER_NODE_IP | awk '{print $2}')
+ else
+ export NCCL_SOCKET_IFNAME=$(ip route get $DSTACK_MASTER_NODE_IP | sed -E 's/.*?dev (\S+) .*/\1/;t;d')
+ fi
+ # ... The rest of the commands
+ ```
!!! info "Fleets"
- To ensure all nodes are provisioned into a cluster placement group and to enable the highest level of inter-node
- connectivity (incl. support for [EFA :material-arrow-top-right-thin:{ .external }](https://aws.amazon.com/hpc/efa/){:target="_blank"}),
- create a [fleet](fleets.md) via a configuration before running a disstributed task.
+ Distributed tasks can only run on fleets with
+ [cluster placement](fleets.md#cloud-placement).
+ While `dstack` can provision such fleets automatically, it is
+ recommended to create them via a fleet configuration
+ to ensure the highest level of inter-node connectivity.
`dstack` is easy to use with `accelerate`, `torchrun`, Ray, Spark, and any other distributed frameworks.
@@ -303,7 +329,7 @@ If you don't assign a value to an environment variable (see `HF_TOKEN` above),
| `DSTACK_NODES_NUM` | The number of nodes in the run |
| `DSTACK_GPUS_PER_NODE` | The number of GPUs per node |
| `DSTACK_NODE_RANK` | The rank of the node |
- | `DSTACK_MASTER_NODE_IP` | The internal IP address the master node |
+ | `DSTACK_MASTER_NODE_IP` | The internal IP address of the master node |
| `DSTACK_NODES_IPS` | The list of internal IP addresses of all nodes delimited by "\n" |
### Spot policy
diff --git a/docs/docs/reference/environment-variables.md b/docs/docs/reference/environment-variables.md
index e94c5cf44..319bbbe0d 100644
--- a/docs/docs/reference/environment-variables.md
+++ b/docs/docs/reference/environment-variables.md
@@ -45,31 +45,33 @@ tasks, and services:
- `DSTACK_NODES_NUM`{ #DSTACK_NODES_NUM } – The number of nodes in the run
- `DSTACK_GPUS_PER_NODE`{ #DSTACK_GPUS_PER_NODE } – The number of GPUs per node
- `DSTACK_NODE_RANK`{ #DSTACK_NODE_RANK } – The rank of the node
-- `DSTACK_NODE_RANK`{ #DSTACK_NODE_RANK } – The internal IP address the master node.
+- `DSTACK_MASTER_NODE_IP`{ #DSTACK_NODE_RANK } – The internal IP address of the master node.
- Below is an example of using `DSTACK_NODES_NUM`, `DSTACK_GPUS_PER_NODE`, `DSTACK_NODE_RANK`, and `DSTACK_NODE_RANK`
+ Below is an example of using `DSTACK_NODES_NUM`, `DSTACK_GPUS_PER_NODE`, `DSTACK_NODE_RANK`, and `DSTACK_MASTER_NODE_IP`
for distributed training:
```yaml
- type: task
- name: train-distrib
-
- # The number of instances in the cluster
- nodes: 2
-
- python: "3.10"
- commands:
- - pip install -r requirements.txt
- - torchrun
- --nproc_per_node=$DSTACK_GPUS_PER_NODE
- --node_rank=$DSTACK_NODE_RANK
- --nnodes=$DSTACK_NODES_NUM
- --master_addr=$DSTACK_MASTER_NODE_IP
- --master_port=8008
- resnet_ddp.py --num_epochs 20
-
- resources:
- gpu: 24GB
+ type: task
+ name: train-distrib
+
+ nodes: 2
+ python: "3.12"
+
+ commands:
+ - git clone https://github.com/pytorch/examples.git
+ - cd examples/distributed/ddp-tutorial-series
+ - pip install -r requirements.txt
+ - torchrun
+ --nproc-per-node=$DSTACK_GPUS_PER_NODE
+ --node-rank=$DSTACK_NODE_RANK
+ --nnodes=$DSTACK_NODES_NUM
+ --master-addr=$DSTACK_MASTER_NODE_IP
+ --master-port=12345
+ multinode.py 50 10
+
+ resources:
+ gpu: 24GB
+ shm_size: 24GB
```
- `DSTACK_NODES_IPS`{ #DSTACK_NODES_IPS } – The list of internal IP addresses of all nodes delimited by `"\n"`.