Skip to content

Latest commit

 

History

History
89 lines (45 loc) · 24.7 KB

The container orchestrator landscape LWN.net.md

File metadata and controls

89 lines (45 loc) · 24.7 KB

Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

Docker and other container engines can greatly simplify many aspects of deploying a server-side application, but numerous applications consist of more than one container. Managing a group of containers only gets harder as additional applications and services are deployed; this has led to the development of a class of tools called container orchestrators. The best-known of these by far is Kubernetes; the history of container orchestration can be divided into what came before it and what came after.

The convenience offered by containers comes with some trade-offs; someone who adheres strictly to Docker's idea that each service should have its own container will end up running a large number of them. Even a simple web interface to a database might require running separate containers for the database server and the application; it might also include a separate container for a web server to handle serving static files, a proxy server to terminate SSL/TLS connections, a key-value store to serve as a cache, or even a second application container to handle background jobs and scheduled tasks.

An administrator who is responsible for several such applications will quickly find themselves wishing for a tool to make their job easier; this is where container orchestrators step in. A container orchestrator is a tool that can manage a group of multiple containers as a single unit. Instead of operating on a single server, orchestrators allow combining multiple servers into a cluster, and automatically distribute container workloads among the cluster nodes.

Docker Compose and Swarm

Docker Compose is not quite an orchestrator, but it was Docker's first attempt to create a tool to make it easier to manage applications that are made out of several containers. It consumes a YAML-formatted file, which is almost always named docker-compose.yml. Compose reads this file and uses the Docker API to create the resources that it declares; Compose also adds labels to all of the resources, so that they can be managed as a group after they are created. In effect, it is an alternative to the Docker command-line interface (CLI) that operates on groups of containers. Three types of resources can be defined in a Compose file:

  • services contains declarations of containers to be launched. Each entry in services is equivalent to a docker run command.
  • networks declares networks that can be attached to the containers defined in the Compose file. Each entry in networks is equivalent to a docker network create command.
  • volumes defines named volumes that can be attached to the containers. In Docker parlance, a volume is persistent storage that is mounted into the container. Named volumes are managed by the Docker daemon. Each entry in volumes is equivalent to a docker volume create command.

Networks and volumes can be directly connected to networks and filesystems on the host that Docker is running on, or they can be provided by a plugin. Network plugins allow things like connecting containers to VPNs; a volume plugin might allow storing a volume on an NFS server or an object storage service.

Compose provides a much more convenient way to manage an application that consists of multiple containers, but, at least in its original incarnation, it only worked with a single host; all of the containers that it created were run on the same machine. To extend its reach across multiple hosts, Docker introduced Swarm mode in 2016. This is actually the second product from Docker to bear the name "Swarm" — a product from 2014 implemented a completely different approach to running containers across multiple hosts, but it is no longer maintained. It was replaced by SwarmKit, which provides the underpinnings of the current version of Docker Swarm.

Swarm mode is included in Docker; no additional software is required. Creating a cluster is a simple matter of running docker swarm init on an initial node, and then docker swarm join on each additional node to be added. Swarm clusters contain two types of nodes. Manager nodes provide an API to launch containers on the cluster, and communicate with each other using a protocol based on the Raft Consensus Algorithm in order to synchronize the state of the cluster across all managers. Worker nodes do the actual work of running containers. It is unclear how large these clusters can be; Docker's documentation says that a cluster should have no more than 7 manager nodes but does not specify a limit on the number of worker nodes. Bridging container networks across nodes is built-in, but sharing storage between nodes is not; third-party volume plugins need to be used to provide shared persistent storage across nodes.

Services are deployed on a swarm using Compose files. Swarm extended the Compose format by adding a deploy key to each service that specifies how many instances of the service should be running and which nodes they should run on. Unfortunately, this led to a divergence between Compose and Swarm, which caused some confusion because options like CPU and memory quotas needed to be specified in different ways depending on which tool was being used. During this period of divergence, a file intended for Swarm was referred to as a "stack file" instead of a Compose file in an attempt to disambiguate the two; thankfully, these differences appear to have been smoothed over in the current versions of Swarm and Compose, and any references to a stack file being distinct from a Compose file seem to have largely been scoured from the Internet. The Compose format now has an open specification and its own GitHub organization providing reference implementations.

There is some level of uncertainty about the future of Swarm. It once formed the backbone of a service called Docker Cloud, but the service was suddenly shut down in 2018. It was also touted as a key feature of Docker's Enterprise Edition, but that product has since been sold to another company and is now marketed as Mirantis Kubernetes Engine. Meanwhile, recent versions of Compose have gained the ability to deploy containers to services hosted by Amazon and Microsoft. There has been no deprecation announcement, but there also hasn't been any announcement of any other type in recent memory; searching for the word "Swarm" on Docker's website only turns up passing mentions.

Kubernetes

Kubernetes (sometimes known as k8s) is a project inspired by an internal Google tool called Borg. Kubernetes manages resources and coordinates running workloads on clusters of up to thousands of nodes; it dominates container orchestration like Google dominates search. Google wanted to collaborate with Docker on Kubernetes development in 2014, but Docker decided to go its own way with Swarm. Instead, Kubernetes grew up under the auspices of the Cloud Native Computing Foundation (CNCF). By 2017, Kubernetes had grown so popular that Docker announced that it would be integrated into Docker's own product.

Aside from its popularity, Kubernetes is primarily known for its complexity. Setting up a new cluster by hand is an involved task, which requires the administrator to select and configure several third-party components in addition to Kubernetes itself. Much like the Linux kernel needs to be combined with additional software to make a complete operating system, Kubernetes is only an orchestrator and needs to be combined with additional software to make a complete cluster. It needs a container engine to run its containers; it also needs plugins for networking and persistent volumes.

Kubernetes distributions exist to fill this gap. Like a Linux distribution, a Kubernetes distribution bundles Kubernetes with an installer and a curated selection of third-party components. Different distributions exist to fill different niches; seemingly every tech company of a certain size has its own distribution and/or hosted offering to cater to enterprises. The minikube project offers an easier on-ramp for developers looking for a local environment to experiment with. Unlike their Linux counterparts, Kubernetes distributions are certified for conformance by the CNCF; each distribution must implement the same baseline of functionality in order to obtain the certification, which allows them to use the "Certified Kubernetes" badge.

A Kubernetes cluster contains several software components. Every node in the cluster runs an agent called the kubelet to maintain membership in the cluster and accept work from it, a container engine, and kube-proxy to enable network communication with containers running on other nodes.

The components that maintain the state of the cluster and make decisions about resource allocations are collectively referred to as the control plane — these include a distributed key-value store called etcd, a scheduler that assigns work to cluster nodes, and one or more controller processes that react to changes in the state of the cluster and trigger any actions needed to make the actual state match the desired state. Users and cluster nodes interact with the control plane through the Kubernetes API server. To effect changes, users set the desired state of the cluster through the API server, while the kubelet reports the actual state of each cluster node to the controller processes.

Kubernetes runs containers inside an abstraction called a pod, which can contain one or more containers, although running containers for more than one service in a pod is discouraged. Instead, a pod will generally have a single main container that provides a service, and possibly one or more "sidecar" containers that collect metrics or logs from the service running in the main container. All of the containers in a pod will be scheduled together on the same machine, and will share a network namespace — containers running within the same pod can communicate with each other over the loopback interface. Each pod receives its own unique IP address within the cluster. Containers running in different pods can communicate with each other using their cluster IP addresses.

A pod specifies a set of containers to run, but the definition of a pod says nothing about where to run those containers, or how long to run them for — without this information, Kubernetes will start the containers somewhere on the cluster, but will not restart them when they exit, and may abruptly terminate them if the control plane decides the resources they are using are needed by another workload. For this reason, pods are rarely used alone; instead, the definition of a pod is usually wrapped in a Deployment object, which is used to define a persistent service. Like Compose and Swarm, the objects managed by Kubernetes are declared in YAML; for Kubernetes, the YAML declarations are submitted to the cluster using the kubectl tool.

In addition to pods and Deployments, Kubernetes can manage many other types of objects, like load balancers and authorization policies. The list of supported APIs is continually evolving, and will vary depending on which version of Kubernetes and which distribution a cluster is running. Custom resources can be used to add APIs to a cluster to manage additional types of objects. KubeVirt adds APIs to enable Kubernetes to run virtual machines, for example. The complete list of APIs supported by a particular cluster can be discovered with the kubectl api-versions command.

Unlike Compose, each of these objects is declared in a separate YAML document, although multiple YAML documents can be inlined in the same file by separating them with "---", as seen in the Kubernetes documentation. A complex application might consist of many objects with their definitions spread across multiple files; keeping all of these definitions in sync with each other when maintaining such an application can be quite a chore. In order to make this easier, some Kubernetes administrators have turned to templating tools like Jsonnet.

Helm takes the templating approach a step further. Like Kubernetes, development of Helm takes place under the aegis of the CNCF; it is billed as "the package manager for Kubernetes". Helm generates YAML configurations for Kubernetes from a collection of templates and variable declarations called a chart. Its template language is distinct from the Jinja templates used by Ansible but looks fairly similar to them; people who are familiar with Ansible Roles will likely feel at home with Helm Charts.

Collections of Helm charts can be published in Helm repositories; Artifact Hub provides a large directory of public Helm repositories. Administrators can add these repositories to their Helm configuration and use the ready-made Helm charts to deploy prepackaged versions of popular applications to their cluster. Recent versions of Helm also support pushing and pulling charts to and from container registries, giving administrators the option to store charts in the same place that they store container images.

Kubernetes shows no signs of losing momentum any time soon. It is designed to manage any type of resource; this flexibility, as demonstrated by the KubeVirt virtual-machine controller, gives it the potential to remain relevant even if containerized workloads should eventually fall out of favor. Development proceeds at a healthy clip and new major releases come out regularly. Releases are supported for a year; there doesn't seem to be a long-term support version available. Upgrading a cluster is supported, but some prefer to bring up a new cluster and migrate their services over to it.

Nomad

Nomad is an orchestrator from HashiCorp, which is marketed as a simpler alternative to Kubernetes. Nomad is an open source project, like Docker and Kubernetes. It consists of a single binary called nomad, which can be used to start a daemon called the agent and also serves as a CLI to communicate with an agent. Depending on how it is configured, the agent process can run in one of two modes. Agents running in server mode accept jobs and allocate cluster resources for them. Agents running in client mode contact the servers to receive jobs, run them, and report their status back to the servers. The agent can also run in development mode, where it takes on the role of both client and server to form a single-node cluster that can be used for testing purposes.

Creating a Nomad cluster can be quite simple. In Nomad's most basic mode of operation, the initial server agent must be started, then additional nodes can be added to the cluster using the nomad server join command. HashiCorp also provides Consul, which is a general-purpose service mesh and discovery tool. While it can be used standalone, Nomad is probably at its best when used in combination with Consul. The Nomad agent can use Consul to automatically discover and join a cluster, and can also perform health checks, serve DNS records, and provide HTTPS proxies to services running on the cluster.

Nomad supports complex cluster topologies. Each cluster is divided into one or more "data centers". Like Swarm, server agents within a single data center communicate with each other using a protocol based on Raft; this protocol has tight latency requirements, but multiple data centers may be linked together using a gossip protocol that allows information to propagate through the cluster without each server having to maintain a direct connection to every other. Data centers linked together in this way can act as one cluster from a user's perspective. This architecture gives Nomad an advantage when scaled up to enormous clusters. Kubernetes officially supports up to 5,000 nodes and 300,000 containers, whereas Nomad's documentation cites example of clusters containing over 10,000 nodes and 2,000,000 containers.

Like Kubernetes, Nomad doesn't include a container engine or runtime. It uses task drivers to run jobs. Task drivers that use Docker and Podman to run containers are included; community-supported drivers are available for other container engines. Also like Kubernetes, Nomad's ambitions are not limited to containers; there are also task drivers for other types of workloads, including a fork/exec driver that simply runs a command on the host, a QEMU driver for running virtual machines, and a Java driver for launching Java applications. Community-supported task drivers connect Nomad to other types of workloads.

Unlike Docker or Kubernetes, Nomad eschews YAML in favor of HashiCorp Configuration Language (HCL), which was originally created for another HashiCorp project for provisioning cloud resources called Terraform. HCL is used across the HashiCorp product line, although it has limited adoption elsewhere. Documents written in HCL can easily be converted to JSON, but it aims to provide a syntax that is more finger-friendly than JSON and less error-prone than YAML.

HashiCorp's equivalent to Helm is called Nomad Pack. Like Helm, Nomad Pack processes a directory full of templates and variable declarations to generate job configurations. Nomad also has a community registry of pre-packaged applications, but the selection is much smaller than what is available for Helm at Artifact Hub.

Nomad does not have the same level of popularity as Kubernetes. Like Swarm, its development appears to be primarily driven by its creators; although it has been deployed by many large companies, HashiCorp is still very much the center of the community around Nomad. At this point, it seems unlikely the project has gained enough momentum to have a life independent from its corporate parent. Users can perhaps find assurance in the fact that HashiCorp is much more clearly committed to the development and promotion of Nomad than Docker is to Swarm.

Conclusion

Swarm, Kubernetes, and Nomad are not the only container orchestrators, but they are the three most viable. Apache Mesos can also be used to run containers, but it was nearly mothballed in 2021; DC/OS is based on Mesos, but much like Docker Enterprise Edition, the company that backed its development is now focused on Kubernetes. Most "other" container orchestration projects, like OpenShift and Rancher, are actually just enhanced (and certified) Kubernetes distributions, even if they don't have Kubernetes in their name.

Despite (or perhaps, because of) its complexity, Kubernetes currently enjoys the most popularity by far, but HashiCorp's successes with Nomad show that there is still room for alternatives. Some users remain loyal to the simplicity of Docker Swarm, but its future is uncertain. Other alternatives appear to be largely abandoned at this point. It would seem that the landscape has largely settled around these three players, but container orchestration is a still a relatively immature area. Ten years ago, very little of this technology even existed, and things are still evolving quickly. There are likely many exciting new ideas and developments in container orchestration that are still to come.

[Special thanks to Guinevere Saenger for educating me with regard to some of the finer points of Kubernetes and providing some important corrections for this article.]

Index entries for this article
GuestArticles

(Log in to post comments)