Skip to content

Commit

Permalink
Add descheduler/runtimeproxy chapter, and adjust the description of s…
Browse files Browse the repository at this point in the history
…ome scheduling chapters (koordinator-sh#135)

Signed-off-by: Fansong Zeng <[email protected]>
  • Loading branch information
hormes authored Jun 5, 2023
1 parent 7863f48 commit 9589c92
Show file tree
Hide file tree
Showing 3 changed files with 41 additions and 13 deletions.
25 changes: 19 additions & 6 deletions docs/architecture/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,26 @@ Koordinator adds co-location capabilities on top of the original kubernetes, and

![Architecture](/img/architecture.png)

## Koordinator Scheduler
## Koord-Scheduler

The Koordinator Scheduler is deployed as a ```Deployment```, which is used to enhance the resource scheduling capabilities of kubernetes in colocation scenarios, including:
The Koordinator Scheduler is deployed as a ```Deployment```, which is used to enhance the resource scheduling capabilities of kubernetes in QoS-aware, differentiated SLO management, and job scheduling. Specifically including:

- More scenario support, including elastic quota scheduling, resource overcommitment, resource reservation, gang scheduling, heterogeneous resource scheduling.
- Better performance, including dynamic index optimization, equivalence class scheduling, random relaxation algorithm optimization.
- Safer descheduling, including workload availability awareness, deterministic pod migration, fine grained flow control, and modification audit support.
- QoS-aware scheduling, including load-aware scheduling to make node load more balanced, resource overcommitment to run more computing workloads with low priority.
- Differentiated SLO management, including fine-grained CPU orchestration, different QoS policy(cfs/LLC/memory bw/net bw/blkio) for diffenent workloads.
- Job scheduling, including elastic quota scheduling, gang scheduling, heterogeneous resource scheduling, to support big-data and AI workloads.

In order to better support diffenent workloads, the scheduler also provides a series of general capability enhancements:
- Reservation, an ability for reserving node resources for specific pods or workloads, which is widely used in descheduling, resource preemption and fragmentation optimization.
- Node reservation, an ability for reserving node resources for workloads out of kubernetes, which is typically used for non-containerized workloads.

## Koordinator Manager
## Koord-Descheduler

The Koordinator Descheduler is deployed as a ```Deployment```, which is an enhanced version of the community descheduler:

- Framework, a descheduling framework with better scalability, determinism and security, for more [details](../designs/descheduler-framework).
- Load-aware descheduling, a descheduling plugins to support node load rebalancing, which supports user-defined CPU load level of nodes to avoids hotspot nodes.

## Koord-Manager

The Koordinator Manager is deployed as a ``` Deployment ```, usually consists of two instances, one leader and one backup. The Koordinator Manager consists of several controllers and webhooks, which are used to orchestrate co-located workloads and support resource overcommitment scheduling and SLO management.

Expand All @@ -35,6 +45,9 @@ Inside Koordlet, it mainly includes the following modules:
- QoS Manager, which dynamically adjusts the water level of node colocation based on resource profiling, interference detection results and SLO configuration, suppressing Pods that affect service quality.
- Resource Tuning, container resource tuning for co-located scenarios, optimize the container's CPU Throttle, OOM, etc., to improve the quality of service operation.

## Koord-RuntimeProxy

The Koord-RuntimeProxy is deployed as a ``` systemd service ``` in kubernetes node, which is designed to intercept CRI request, and apply some resource management policies, such as setting different cgroup parameters by pod priorities under hybrid workload orchestration scenario, applying new isolation policies for latest Linux kernel, CPU architecture, and etc.

## What's Next

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,29 @@ Koordinator 在 Kubernetes 原有的能力基础上增加了混部功能,并

![架构](/img/architecture.png)

## Koordinator Scheduler
## Koord-Scheduler

Koordinator Scheduler以 Deployment 的形式部署,用于增强 Kubernetes 在混部场景下的资源调度能力,包括:
Koord-Scheduler 以 Deployment 的形式部署在集群中,用于增强 Kubernetes 在 QoS-aware,差异化 SLO 以及任务调度场景的资源调度能力,具体包括:

- 更多的场景支持,包括弹性配额调度、资源超卖(resource overcommitment)、资源预留(resource reservation)、Gang 调度、异构资源调度
- 更好的性能,包括动态索引优化、等价 class 调度、随机算法优化
- 更安全的 descheduling,包括工作负载感知、确定性的 pod 迁移、细粒度的流量控制和变更审计支持
- QoS-aware 调度,包括负载感知调度让节点间负载更佳平衡,资源超卖的方式支持运行更多的低优先级工作负载
- 差异化 SLO,包括 CPU 精细化编排,为不同的工作负载提供不同的 QoS 隔离策略(cfs,LLC,memory 带宽,网络带宽,磁盘io)
- 任务调度,包括弹性额度管理,Gang 调度,异构资源调度等,以支持更好的运行大数据和 AI 工作负载

## Koordinator Manager
为了更好的支持不同类型的工作负载,Koord-scheduler 还包括了一些通用性的能力增强:

Koordinator Manager 以 Deployment 的形式部署,通常由两个实例组成,一个 leader 实例和一个 backup 实例。Koordinator Manager 由几个控制器和 webhooks 组成,用于协调混部场景下的工作负载,资源超卖(resource overcommitment)和 SLO 管理。
- Reservation,支持为特定的 Pod 或者工作负载预留节点资源。资源预留特性广泛应用于重调度,资源抢占以及节点碎片整理等相关优化过程。
- Node Reservation,支持为 kubernetes 之外的工作负载预留节点资源,一般应用于节点上运行着非容器化的负载场景。

## Koord-Descheduler

Koord-Decheduler 以 Deployment 的形式部署在集群中,它是 kubernetes 上游社区的增强版本,当前包含:

- 重调度框架, Koord-Decheduler 重新设计了全新重调度框架,在可扩展性、资源确定性以及安全性上增加了诸多的加强,更多的[细节](../designs/descheduler-framework).
- 负载感知重调度,基于新框架实现的一个负载感知重调度插件,支持用户配置节点的安全水位,以驱动重调度器持续优化集群编排,从而规避集群中出现局部节点热点.

## Koord-Manager

Koord-Manager 以 Deployment 的形式部署,通常由两个实例组成,一个 leader 实例和一个 backup 实例。Koordinator Manager 由几个控制器和 webhooks 组成,用于协调混部场景下的工作负载,资源超卖(resource overcommitment)和 SLO 管理。

目前,提供了三个组件:

Expand All @@ -35,6 +47,9 @@ Koordlet 以 DaemonSet 的形式部署在 Kubernetes 集群中,用于支持混
- QoS 管理器,根据资源剖析、干扰检测结果和 SLO 配置,动态调整混部节点的水位,抑制影响服务质量的 Pod。
- 资源调优,针对混部场景进行容器资源调优,优化容器的 CPU Throttle、OOM 等,提高服务运行质量。

## Koord-RuntimeProxy

Koord-RuntimeProxy 以 systemd service 的形式部署在 Kubernetes 集群的节点上,用于代理 Kubelet 与 containerd/docker 之间的 CRI 请求。这一个代理被设计来支持精细化的资源管理策略,比如为不同 QoS Pod 设置不同的 cgroup 参数,包括内核 cfs quota,resctl 等等技术特性,以改进 Pod 的运行时质量。。

## 下一步

Expand Down
Binary file modified static/img/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 9589c92

Please sign in to comment.