Skip to content

Build a Production-Ready Docker Swarm cluster using Ansible. The goal is rapidly bootstrap a Docker Swarm cluster with some essential services.

License

Notifications You must be signed in to change notification settings

LombardiDaniel/swarm-ansible

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated Docker Swarm bootstrap using Ansible

Build a Docker Swarm cluster using Ansible with swarm. The goal is to rapidly bootstrap a Docker Swarm cluster on machines running: Debian; Ubuntu.

🚨 Disclaimer:

This repo aims to be a simple (however somewhat production-ready) bootstrap of a docker swarm cluster using default configs, especially for the services in available composes. If you would like more configuration options (considering for security), please read their respective documentations and configure them manually. I advise to NOT run any databases on this setup. Hosting of databases are complicated, it is better to just pay a service to do that for us, or use free-tier (there are many).


✅ System Requirements

  • Control Node (the machine you will be running the Ansible commands). I am using Ansible 2.15.1.
  • All swarm nodes (manager and workers) should have passwordless SSH access, this can be setup by passing your SSH public keys when you create your VM. You can also check this Digital Ocean Guide to set up ssh key based authentication on linux machines.

🚀 Getting Started

💻 Hardware we will use

For this example, we will be using the Oracle Cloud Allways-Free Tier. We can instantiate one machine per Fault Domain, ensuring our VMs are not all on the same physical hardware.

For this demo, we are using 3 machines (all of them free):

  • Ampere aarch64 Altra 4-core CPU w/ 24GB RAM
  • AMD x86_64 EPYC 1-core CPU w/ 1GB RAM
  • AMD x86_64 EPYC 1-core CPU w/ 1GB RAM

This gives us a total of 8-threads and 26GB of RAM for free!

Since docker swarm is a distributed container orchestration tool, we need manager and worker nodes. Add the public-ip of all manager nodes (in our case, just the left one) to your DNS, as traefik (the reverse proxy we will be using must run on managers to retrieve info about the swarm).

You should maintain an odd number of managers in the swarm to support manager node failures. Having an odd number of managers ensures that during a network partition, there is a higher chance that the quorum remains available to process requests if the network is partitioned into two sets. Keeping the quorum is not guaranteed if you encounter more than two network partitions. https://docs.docker.com/engine/swarm/admin_guide/#add-manager-nodes-for-fault-tolerance

Our architecture will look something like this (if you decide to bootstrap included services):

architecture

The DNS will point to our manager nodes (in this case, just the left one) and traefik will be exposed on port 80 (HTTP) and 443 (HTTPS), we also set up redirection of port 80 -> 443 via traefik. Note that the my-app instance access the database without being exposed directly to the web, the "my-app-net" is a docker network with the overlay driver, this means that it exists on all nodes, allowing for inter-node comunication.

🍴Preparation

The first thing we need to do after getting our hardware ready is create a hosts.ini file. You must have at least one manager node.

[node0]  # The node you will use to run init the swarm (must be a manager)
1.2.3.4

[all]  # all nodes public ips
1.2.3.4
1.2.3.5
1.2.3.6

[managers]  # manager nodes public ips
1.2.3.4

[workers]  # worker nodes public ips
1.2.3.5
1.2.3.6

You will have to change the remote user by passing the -u REMOTE_USER flag in the ansible comands.

🐳 Creating the Cluster

After creating your hosts file and configuring the default remote user, we can start running the playbooks.

To setup the cluster and install dependencies:

NOTE: this command DISABLES iptables firewall, do NOT host services on bare-metal after this.

ansible-playbook -u REMOTE_USER -i hosts/hosts.ini playbooks/setup.yml

Then, to initiailze the cluster:

ansible-playbook -u REMOTE_USER -i hosts/hosts.ini playbooks/bootstrap_swarm.yml

The script already takes care of the different swarm join tokens, so there is no need for extra configuration.

If anything goes wrong or you just want to dismantle the swarm, simply run:

ansible-playbook -u REMOTE_USER -i hosts/hosts.ini playbooks/dismantle_swarm.yml

🚚 Base Services

If you also want to bootstrap some base services, you can use this section to do so. The services that will be installed here are:

  • Traefik - Reverse Proxy and Load Balancer to access the cluster services
  • Portainer - Container Orchestration web UI
  • Registry - Container (private) Registry for your docker images - Note that we are going to be using simple HttpAuth, check this for other options
  • SwarmCronjob - Simple cronjob solution
  • Shepherd - Update your services on image update
  • Dozzle - Simple logging and monitoring

To bootstrap these services, we'll need to do a tiny bit more configuring. To use traefik, we'll need a domain name (configure the DNS to point both example.com and *.example.com to the managers ip's), and since in this example we use it to create SSL certificates, we need a maintainer email. To configure it, go to vars.yml:

domain_name: cloud.example.com # <- your domain
maintainer_email: [email protected] # <- your email
basic_auth_password: adminPass # <- registry and traefik http password (user is admin)

Since we uploaded our own private container registry, we can deploy our services with a simple CI/CD build such as GitHub Actions, just build and push to your registry, it will be available at https://registry.YOUR_DOMAIN_NAME/, then Shepherd will update it automatically. You can also check the registry UI at https://registry-ui.YOUR_DOMAIN_NAME/.

After configuring it, simply run:

ansible-playbook -u REMOTE_USER -i hosts/hosts.ini --extra-vars @vars.yml playbooks/bootstrap_essential_services.yml

Traefik will take a few moments to generate the TLS certificates but after that, you can access those services with their subdomain. For example:

Portainer: https://portainer.example.com

Remember that in the case of portainer, you have a limited ammount of time to access it and create the admin user, if you don't, you'll need to restart the container service (on a manager node: docker service update portainer_portainer).

🗒️ NOTEs:

  • traefik http_pass config is: admin:adminPass, to change it, take a look at composes/traefik/replace_pass.sh.
  • the portainer version we are running is the Community Edition (CE), you can run the Enterprise Edition (EE) for free for up to 3-nodes it gives some pretty cool functionality, check it out.

👟 Running your own services

Since we have a small cluster with very limited resources, it is pretty important to set resource limits in the compose files. To create the routes in traefik, you must add these labels to the deploy segment in the compose file. Take a look at the following example:

version: "3.7"

services:
  my-app:
    image: my-app:prod
    networks:
      - traefik-public # this allows the container to be accessed by traefik
    deploy:
      resouces:
        limits:
          cpus: "0.50"
          memory: 50M
      replicas: 3
      mode: replicated # or "global" -> on all nodes
      labels:
        - shepherd.autodeploy=true # this will make shepherd watch for updates (on image tagged with "my-app:prod")
        - traefik.enable=true
        - traefik.http.services.${MY_SERVICE_NAME}.loadbalancer.server.port=${TARGET_PORT}
        - traefik.http.routers.${MY_SERVICE_NAME}.rule=Host(`${SUBDOMAIN_TO_REDIRECT}.${DOMAIN_NAME}`)
        - traefik.http.routers.${MY_SERVICE_NAME}.entrypoints=websecure
        - traefik.http.routers.${MY_SERVICE_NAME}.tls=true
        - traefik.http.routers.${MY_SERVICE_NAME}.tls.certresolver=leresolver
        # - traefik.http.routers.${MY_SERVICE_NAME}.service=${MY_SERVICE_NAME} # only needed if going to use more than 1 route (port) per service
        # - traefik.http.routers.${MY_SERVICE_NAME}.middlewares=admin-auth  # this line enables the admin-auth on the service

networks:
  traefik-public:
    driver: overlay
    external: true
docker stack deploy -c compose.yml MY_STACK_NAME --with-registry-auth

In this yaml snippet, we have 4 vars:

  • MY_SERVICE_NAME: The name of the service in the compose file (i.e. "my-app", the name defined in the compose)
  • SUBDOMAIN_TO_REDIRECT: The subdomain used to redirect to that service
  • DOMAIN_NAME: Your domain name (can be used in conjunction with the subdomain to redirect from a whole new domain)
  • TARGET_PORT: The port where the service is running in it's container

After this, check:

The user is admin and the password is the one you previously configured.

Take a look at examples to see examples of:

  • Stateful apps running in the cluster (take a note at the placement constraints in the compose).
  • A reverse proxy (L7) configuration; for L4, you'll have to run an NGINX (or your LB of preference) and map the ports host:container, routing them manually, just remember to configure the container to be restrained to a single (manager) node so there is no chance of it's IP changing.
  • Example GitHub Actions Workflow CI (with multi-arch, needed for heterogeneous clouds) file that ends up triggering the shepherd daemon to do the CD locally in the cluster. The image tags to push on that workflow are: repository-name:branch-name.

🍪 Thanks

This repo was somewhat inspired by tecno-tim's k3s-ansible. Check him out!

About

Build a Production-Ready Docker Swarm cluster using Ansible. The goal is rapidly bootstrap a Docker Swarm cluster with some essential services.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages