A running Kubernetes cluster contains node agents (kubelet) and master components (APIs, scheduler, etc), on top of a distributed storage solution. This diagram shows our desired eventual state, though we're still working on a few things, like making kubelet itself (all our components, really) run within containers, and making the scheduler 100% pluggable.
When looking at the architecture of the system, we'll break it down to services that run on the worker node and services that compose the cluster-level control plane.
The Kubernetes node has the services necessary to run application containers and be managed from the master systems.
Each node runs Docker, of course. Docker takes care of the details of downloading images and running containers.
The Kubelet manages pods and their containers, their images, their volumes, etc.
Each node also runs a simple network proxy and load balancer (see the services FAQ for more details). This reflects services
(see the services doc for more details) as defined in the Kubernetes API on each node and can do simple TCP and UDP stream forwarding (round robin) across a set of backends.
Service endpoints are currently found via DNS or through environment variables (both Docker-links-compatible and Kubernetes {FOO}_SERVICE_HOST and {FOO}_SERVICE_PORT variables are supported). These variables resolve to ports managed by the service proxy.
The Kubernetes control plane is split into a set of components. Currently they all run on a single master node, but that is expected to change soon in order to support high-availability clusters. These components work together to provide a unified view of the cluster.
All persistent master state is stored in an instance of etcd
. This provides a great way to store configuration data reliably. With watch
support, coordinating components can be notified very quickly of changes.
The apiserver serves up the Kubernetes API. It is intended to be a CRUD-y server, with most/all business logic implemented in separate components or in plug-ins. It mainly processes REST operations, validates them, and updates the corresponding objects in etcd
(and eventually other stores).
The scheduler binds unscheduled pods to nodes via the /binding
API. The scheduler is pluggable, and we expect to support multiple cluster schedulers and even user-provided schedulers in the future.
All other cluster-level functions are currently performed by the Controller Manager. For instance, Endpoints
objects are created and updated by the endpoints controller, and nodes are discovered, managed, and monitored by the node controller. These could eventually be split into separate components to make them independently pluggable.
The replicationcontroller
is a mechanism that is layered on top of the simple pod
API. We eventually plan to port it to a generic plug-in mechanism, once one is implemented.