-
Notifications
You must be signed in to change notification settings - Fork 51
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: l1b0k <[email protected]>
- Loading branch information
Showing
2 changed files
with
240 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
|
||
# Network Bandwidth Limitation Using Terway QoS | ||
|
||
## Introduction | ||
|
||
The birth of terway-qos is to address the issue of network bandwidth contention in mixed deployment scenarios. It supports bandwidth limitation by individual Pods and by business type. | ||
Compared to other solutions, terway-qos has the following advantages: | ||
|
||
Supports bandwidth limitation by business type, accommodating mixed deployment of various business types. | ||
Supports dynamic adjustment of Pod bandwidth limitations. | ||
Provides whole-machine bandwidth limitation, supporting multiple network cards. | ||
Supports bandwidth limitation for container networks and HostNetwork Pods. | ||
|
||
Terway QoS includes three bandwidth priority levels, which correspond to Koordinator’s default QoS mapping as follows. | ||
|
||
You can set QoS priority for Pods using the familiar Koordinator configuration. | ||
|
||
|
||
| Koordinator QoS | Kubernetes QoS | Terway Net QoS | | ||
| :-------------- | :------------------- | :------------- | | ||
| SYSTEM | -- | L0 | | ||
| LSE | Guaranteed | L1 | | ||
| LSR | Guaranteed | L1 | | ||
| LS | Guaranteed/Burstable | L1 | | ||
| BE | BestEffort | L2 | | ||
|
||
## Configuration Parameters | ||
|
||
### Setting Whole-Machine Bandwidth Limitation | ||
|
||
In mixed deployment scenarios, we expect online businesses to have maximum bandwidth assurance to avoid contention. During idle times, offline businesses can also make full use of all bandwidth resources. | ||
|
||
Thus, users can define three priority levels for business traffic: L0, L1, L2. Their priority decreases in that order. | ||
Definition of contention scenario: When the total traffic of L0 + L1 + L2 exceeds the whole-machine bandwidth. | ||
|
||
L0’s maximum bandwidth dynamically adjusts based on the real-time traffic of L1 and L2. It can be as high as the whole-machine bandwidth and as low as whole-machine bandwidth - L1’s minimum bandwidth - L2’s minimum bandwidth. | ||
Under any circumstances, the bandwidth of L1 and L2 does not exceed their respective upper limits. | ||
In a contention scenario, the bandwidth of L1 and L2 will not fall below their respective lower limits. | ||
In a contention scenario, bandwidth will be limited in the order of L2, L1, and L0. As Terway QoS has only three priority levels, the whole-machine bandwidth limitation can only be set for LS and BE, and the remaining L0 part is calculated based on the whole-machine’s bandwidth cap. | ||
|
||
Here is an example configuration: | ||
|
||
```yaml | ||
resource-qos-config: | | ||
{ | ||
"clusterStrategy": { | ||
"policies": {"netQOSPolicy":"terway-qos"}, | ||
"lsClass": { | ||
"networkQOS": { | ||
"enable": true, | ||
"ingressRequest": "50M", | ||
"ingressLimit": "100M", | ||
"egressRequest": "50M", | ||
"egressLimit": "100M" | ||
} | ||
}, | ||
"beClass": { | ||
"networkQOS": { | ||
"enable": true, | ||
"ingressRequest": "10M", | ||
"ingressLimit": "200M", | ||
"egressRequest": "10M", | ||
"egressLimit": "200M" | ||
} | ||
} | ||
} | ||
} | ||
system-config: |- | ||
{ | ||
"clusterStrategy": { | ||
"totalNetworkBandwidth": "600M" | ||
} | ||
} | ||
``` | ||
Please note: | ||
- `clusterStrategy.policies.netQOSPolicy` 需配置为 `erway-qos` | ||
|
||
> 单位 `bps` | ||
|
||
### Pod Bandwidth Limitation | ||
|
||
To specify bandwidth limitations for a Pod: | ||
|
||
| Key | Value | | ||
| :------------------------ | :---------------------------------------------- | | ||
| koordinator.sh/networkQOS | '{"IngressLimit": "10M", "EgressLimit": "20M"}' | | ||
|
||
> Unit `bps` | ||
|
||
## Kernel Version | ||
|
||
- Supports kernel version 4.19 and above. | ||
- Tested on 5.10 (Alinux3). | ||
|
||
On kernel 5.1 and above, EDT is used for Egress direction rate limiting. Other kernel versions and Ingress direction limitations use Token Bucket for rate limiting. | ||
|
||
### Limitation | ||
|
||
Under Kernel 5.0, `HostNetwork` pod's priority is not supported. | ||
|
||
## Deploying Koordinator | ||
|
||
1. Ensure that koordinator is installed and that the koordinator version is 1.5 or higher. | ||
2. The koordlet needs to be configured with RuntimeHook. | ||
|
||
```sh | ||
helm install koordinator koordinator-sh/koordinator --version 1.5.0 --set koordlet.features="TerwayQoS=true\,BECPUEvict=true\,BEMemoryEvict=true\,CgroupReconcile=true\,Accelerators=true" | ||
``` | ||
|
||
## Deploying Terway QoS | ||
|
||
Execute the following command to install. After starting, it will mount the tc eBPF program on the ingress and egress directions of the host network card. | ||
|
||
```sh | ||
helm install -nkube-system terway-qos --set qos.qosConfigSource=file oci://registry-1.docker.io/l1b0k/terway-qos --version 0.3.1 | ||
``` |
121 changes: 121 additions & 0 deletions
121
...saurus-plugin-content-docs/current/best-practices/network-qos-with-terwayqos.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
|
||
# 使用 Terway QoS 进行网络带宽限制 | ||
|
||
|
||
## Introduction | ||
|
||
terway-qos 的诞生是为了解决混部场景下,容器网络带宽争抢问题。支持按单Pod、按业务类型限制带宽。 | ||
|
||
相比于其他方案,terway-qos 有以下优势: | ||
|
||
1. 支持按业务类型限制带宽,支持多种业务类型混部 | ||
2. 支持 Pod 带宽限制动态调整 | ||
3. 整机带宽限制,支持多个网卡 | ||
4. 支持容器网络、HostNetwork Pod 带宽限制 | ||
|
||
Terway QoS 总共包含了3个带宽优先级,对应 Koordinator 默认 QoS 映射如下。 | ||
你可以通过熟悉的 Koordinator 配置为Pod 设置 QoS 优先级。 | ||
|
||
| Koordinator QoS | Kubernetes QoS | Terway Net QoS | | ||
| :-------------- | :------------------- | :------------- | | ||
| SYSTEM | -- | L0 | | ||
| LSE | Guaranteed | L1 | | ||
| LSR | Guaranteed | L1 | | ||
| LS | Guaranteed/Burstable | L1 | | ||
| BE | BestEffort | L2 | | ||
|
||
## 配置参数 | ||
|
||
### 设置整机带宽限制 | ||
|
||
混部场景下,我们期望在线业务有最大带宽的保证,从而避免争抢。在空闲时,离线业务也能尽可能使用全部带宽资源。 | ||
由此用户可为业务流量定义三种优先级,L0,L1,L2。其优先级顺序依次递减。 | ||
|
||
争抢场景定义: 当 `L0 + L1 + L2` 总流量大于整机带宽 | ||
|
||
- L0 最大带宽依据 L1, L2 实时流量而动态调整。最大为整机带宽,最小为 `整机带宽- L1 最小带宽- L2 最小带宽`。 | ||
- 任何情况下,L1、L2 其带宽不超过各自带宽上限。 | ||
- 争抢场景下, L1、L2 其带宽不会低于各自带宽下限。 | ||
- 争抢场景下,将按照 L2 、L1 、L0 的顺序对带宽进行限制。 | ||
|
||
Terway QoS 只有三个优先级,所以整机带宽限制只能设置为`LS`、`BE`两个,剩余`L0`部分是通过整机的带宽上限计算的。 | ||
|
||
下面是一个配置示例: | ||
|
||
```yaml | ||
resource-qos-config: | | ||
{ | ||
"clusterStrategy": { | ||
"policies": {"netQOSPolicy":"terway-qos"}, | ||
"lsClass": { | ||
"networkQOS": { | ||
"enable": true, | ||
"ingressRequest": "50M", | ||
"ingressLimit": "100M", | ||
"egressRequest": "50M", | ||
"egressLimit": "100M" | ||
} | ||
}, | ||
"beClass": { | ||
"networkQOS": { | ||
"enable": true, | ||
"ingressRequest": "10M", | ||
"ingressLimit": "200M", | ||
"egressRequest": "10M", | ||
"egressLimit": "200M" | ||
} | ||
} | ||
} | ||
} | ||
system-config: |- | ||
{ | ||
"clusterStrategy": { | ||
"totalNetworkBandwidth": "600M" | ||
} | ||
} | ||
``` | ||
请注意 | ||
- `clusterStrategy.policies.netQOSPolicy` 需配置为 `erway-qos` | ||
|
||
> 单位 `bps` | ||
|
||
### Pod 带宽限制 | ||
|
||
为 Pod 指定带宽限制: | ||
|
||
| Key | Value | | ||
| :------------------------ | :---------------------------------------------- | | ||
| koordinator.sh/networkQOS | '{"IngressLimit": "10M", "EgressLimit": "20M"}' | | ||
|
||
> 单位 `bps` | ||
|
||
## 内核版本 | ||
|
||
- 支持内核版本 4.19 以上 | ||
- 测试过 5.10 (Alinux3) | ||
|
||
在5.1 以上内核中 Egress 方向使用 EDT 进行限速。 | ||
其他内核版本和 Ingress 方向限制,使用 Token Bucket 进行限速。 | ||
|
||
### 约束 | ||
|
||
内核版本 5.0 以下,不支持`HostNetwork` Pod优先级配置。 | ||
|
||
## 部署 Koordinator | ||
|
||
1. 确保已经安装了 koordinator,且 koordinator 版本大于等于 1.5 | ||
2. koordlet 需配置 RuntimeHook | ||
|
||
```sh | ||
helm install koordinator koordinator-sh/koordinator --version 1.5.0 --set koordlet.features="TerwayQoS=true\,BECPUEvict=true\,BEMemoryEvict=true\,CgroupReconcile=true\,Accelerators=true" | ||
``` | ||
|
||
## 部署 Terway QoS | ||
|
||
执行下面命令安装,启动后会在主机网卡的 ingress 和 egress 方向挂载 tc eBPF 程序。 | ||
|
||
```sh | ||
helm install -nkube-system terway-qos --set qos.qosConfigSource=file oci://registry-1.docker.io/l1b0k/terway-qos --version 0.3.0 | ||
``` |