Skip to content

Commit

Permalink
added amd_configure.md to references (#3275)
Browse files Browse the repository at this point in the history
Co-authored-by: HAI <[email protected]>
Co-authored-by: Yineng Zhang <[email protected]>
Co-authored-by: zhaochenyang20 <[email protected]>
  • Loading branch information
4 people authored Feb 7, 2025
1 parent c1f5f99 commit 0a6f18f
Show file tree
Hide file tree
Showing 2 changed files with 101 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ The core features include:
references/benchmark_and_profiling.md
references/accuracy_evaluation.md
references/custom_chat_template.md
references/amd_configure.md
references/deepseek.md
references/multi_node.md
references/modelscope.md
Expand Down
100 changes: 100 additions & 0 deletions docs/references/amd_configure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# AMD Configuration and Setup for SGLang

## Introduction

This document describes how to set up an AMD-based environment for [SGLang](https://github.com/sgl-project/sglang). If you encounter issues or have questions, please [open an issue](https://github.com/sgl-project/sglang/issues) on the SGLang repository.

## System Configure

When using AMD GPUs (such as MI300X), certain system-level optimizations help ensure stable performance. Here we take MI300X as an example. AMD provides official documentation for MI300X optimization and system tuning:

- [AMD MI300X Tuning Guides](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/index.html)
- [LLM inference performance validation on AMD Instinct MI300X](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference/vllm-benchmark.html)
- [AMD Instinct MI300X System Optimization](https://rocm.docs.amd.com/en/latest/how-to/system-optimization/mi300x.html)
- [AMD Instinct MI300X Workload Optimization](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference-optimization/workload.html)

**NOTE:** We strongly recommend reading theses docs entirely guide to fully utilize your system.

Below are a few key settings to confirm or enable:

### Update GRUB Settings

In `/etc/default/grub`, append the following to `GRUB_CMDLINE_LINUX`:

```text
pci=realloc=off iommu=pt
```

Afterward, run `sudo update-grub` (or your distro’s equivalent) and reboot.

### Disable NUMA Auto-Balancing

```bash
sudo sh -c 'echo 0 > /proc/sys/kernel/numa_balancing'
```

You can automate or verify this change using [this helpful script](https://github.com/ROCm/triton/blob/rocm_env/scripts/amd/env_check.sh).

Again, please go through the entire documentation to confirm your system is using the recommended configuration.

## Installing SGLang

For general installation instructions, see the official [SGLang Installation Docs](https://docs.sglang.ai/start/install.html). Below are the AMD-specific steps summarized for convenience.

### Install from Source

```bash
git clone https://github.com/sgl-project/sglang.git
cd sglang

pip install --upgrade pip
pip install sgl-kernel --force-reinstall --no-deps
pip install -e "python[all_hip]"
```

### Install Using Docker (Recommended)

1. Build the docker image.

```bash
docker build -t sglang_image -f Dockerfile.rocm .
```

2. Create a convenient alias.

```bash
alias drun='docker run -it --rm --network=host --device=/dev/kfd --device=/dev/dri \
--ipc=host --shm-size 16G --group-add video --cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
-v $HOME/dockerx:/dockerx \
-v /data:/data'
```

3. Launch the server.

**NOTE:** Replace `<secret>` below with your [huggingface hub token](https://huggingface.co/docs/hub/en/security-tokens).

```bash
drun -p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
sglang_image \
python3 -m sglang.launch_server \
--model-path NousResearch/Meta-Llama-3.1-8B \
--host 0.0.0.0 \
--port 30000
```

4. To verify the utility, you can run a benchmark in another terminal or refer to [other docs](https://docs.sglang.ai/backend/openai_api_completions.html) to send requests to the engine.

```bash
drun sglang_image \
python3 -m sglang.bench_serving \
--backend sglang \
--dataset-name random \
--num-prompts 4000 \
--random-input 128 \
--random-output 128
```

With your AMD system properly configured and SGLang installed, you can now fully leverage AMD hardware to power SGLang’s machine learning capabilities.

0 comments on commit 0a6f18f

Please sign in to comment.