This plugin provides a mechanism for peer node discovery in RabbitMQ clusters. It also supports a few opinionated features around cluster formation and "permanently unavailable" node detection.
Starting with RabbitMQ 3.7.0 (including previews and recent snapshot builds), this plugin was superseded by a new peer discovery subsystem built on the same ideas and supporting the same backends via separate plugins.
Nodes using this plugin will discover its peers on boot and (optionally) register with one of the supported backends:
- AWS EC2 instance tags
- AWS Autoscaling Groups
- Kubernetes
- DNS A records
- Consul
- etcd
If at least one peer node has been discovered, cluster formation proceeds as usual, otherwise the node is considered to be the first one to come up and becomes the seed node.
To avoid a natural race condition around seed node "election" when a newly formed cluster first boots, peer discovery backends use either randomized delays or a locking mechanism.
Some backends support node health checks. Nodes not reporting their status periodically are considered to be in an errored state. If the user opts in, such nodes can be automatically removed from the cluster. This is useful for deployments that use AWS autoscaling groups or similar IaaS features, for example.
This plugin only covers cluster formation and does not change how RabbitMQ clusters operate once formed.
Note: This plugin is not a replacement for first-hand knowledge of how to manually create a RabbitMQ cluster. If you run into issues using the plugin, you should try and manually create the cluster in the same environment as you are trying to use the plugin in. For information on how to cluster RabbitMQ manually, please see the RabbitMQ documentation.
This plugin was originally developed by Gavin Roy at AWeber and is now co-maintained by several RabbitMQ core contributors. Parts of it were adopted into RabbitMQ core (as of 3.7.0).
There are two branches in this repository that target different RabbitMQ release series:
- stable targets RabbitMQ
3.6.x
(currentstable
RabbitMQ branch) - master is compatible with RabbitMQ
3.7.x
(currentmaster
RabbitMQ branch) but this plugin was superseded by a new peer discovery subsystem built on the same ideas.
Please take this into account when building this plugin from source.
Please also note that key ideas of this plugin have been incorporated into RabbitMQ master
branch and will be included into 3.7.0
. This plugin therefore will become a collection
of backends (e.g. AWS and etcd) rather than a wholesale alternative cluster formation implementation.
This plugin requires Erlang/OTP 18.3 or later. Also see the RabbitMQ Erlang version requirements guide.
Binary releases of autocluster can be found on the GitHub Releases page.
The most recent release is 0.10.0 that
targets RabbitMQ 3.6.12
or later.
See release notes for details.
This plugin is installed the same way as other RabbitMQ plugins.
- Place both
autocluster-{version}.ez
and therabbitmq_aws-{version}.ez
plugin files in the RabbitMQ plugins directory. - Enable the plugin, e.g. with
rabbitmq-plugins enable autocluster --offline
. - Configure the plugin.
- Start the node.
Alternatively, there is a pre-built Docker Image available at on DockerHub as pivotalrabbitmq/rabbitmq-autocluster.
Note that plugin does not have a default backend configured. A little bit of configuration is therefore mandatory regardless of the backend used.
- General Settings
- AWS configuration
- Consul configuration
- DNS configuration
- etcd configuration
- K8S configuration
Configuration for the plugin can be set in two places: operating system environment variables
or the rabbitmq.config
file under the autocluster
section.
The following settings are generic and used by most (or all) service discovery backends:
- Backend Type
- Which type of service discovery backend to use. One of
aws
,consul
,dns
,etcd
ork8s
. - Startup Delay
- To prevent a race condition when creating a new cluster for the first time, the startup delay performs a random sleep that should cause nodes to start in a slightly random offset from each other. The setting lets you control the maximum value for the startup delay.
- Failure Mode
- What behavior to use when the node fails to cluster with an existing RabbitMQ cluster or during initialization of the autocluster plugin. The two valid options are
ignore
andstop
. - Log Level
- You can set the log level via the environment variable
AUTOCLUSTER_LOG_LEVEL
or theautocluster.autocluster_log_level
key (see below). - Longname (FQDN) Support
- This is a RabbitMQ environment variable setting that is used by the autocluster plugin as well. When set to
true
this will cause RabbitMQ and the autocluster plugin to use fully qualified names to identify nodes. For more information about theRABBITMQ_USE_LONGNAME
environment variable, see the RabbitMQ documentation - Node Name
-
Like long node name support, node name is a RabbitMQ server setting that can be used together with this plugin.
When set to
true
this will cause RabbitMQ and the autocluster plugin. TheRABBITMQ_NODENAME
environment variable explicitly sets the node name that is used to identify the node with RabbitMQ. The autocluster plugin will use this value when constructing the local part/name/prefix for all nodes in this cluster. For example, ifRABBITMQ_NODENAME
is set tobunny@rabbit1
,bunny
will be prefixed to all nodes discovered by the various backends. For more information about theRABBITMQ_NODENAME
environment variable, see the RabbitMQ documentation. Note that some backends offer ways to dynamically compute node name (e.g. AWS, Consul), others assume that node names are preconfigured out-of-band and provided by the discovery service (e.g. DNS). In those cases it may or not be possible (or recommended) to useRABBITMQ_NODENAME
. - Node Type
- Define the type of node to join the cluster as. One of
disc
orram
. See the RabbitMQ Clustering Guide for more information. - Cluster Cleanup
- Enables a periodic check that removes any nodes that are not alive in the cluster and no longer listed in the service discovery list. This is a destructive action that removes nodes from the cluster. Nodes that are flapping and removed will be re-added as if they were coming in new and their database, including any persisted messages will be gone. To use this feature, you must not only enable it with this flag, but also disable the "Cleanup Warn Only" flag. Added in v0.5
Note: This is an experimental feature and should be used with caution.
- Cleanup Interval
- If cluster cleanup is enabled, this is the interval that specifies how often to look for dead nodes to remove (in seconds). Added in v0.5
- Cleanup Warn Only
- If set, the plugin will only warn about nodes that it would cleanup and will not perform any destructive actions on the cluster. Added in v0.5
- HTTP Proxy
- If set, the given HTTP URL will be used as a proxy to connect to the service discovery backend.
- HTTPS Proxy
- If set, the given HTTPS URL will be used as a proxy to connect to the service discovery backend.
- Proxy Exclusions
- List of host names which shouldn't use any proxy.
- When using environment variables, the NoProxy list must be provided as a comma separated string:
PROXY_EXCLUSIONS="localhost, 127.0.0.1"
- Autoscaling
- Cluster based upon membership in an Autoscaling Group. Set to
true
to enable. - EC2 Tags
- Filter the cluster node list with the specified tags. Use a comma delimiter for multiple tags when specifying as an environment variable.
- Use private IP
- Use the private IP address returned by autoscaling as hostname, instead of the private DNS name
- Explicitly configured in the
autocluster
configuration. - Environment variables
- Configuration file
- EC2 Instance Metadata Service (for Region)
- Explicitly configured in the
autocluster
configuration. - Environment variables
- Credentials file
- EC2 Instance Metadata Service
- Consul Scheme
- The URI scheme to use when connecting to Consul
- Consul Host
- The hostname to use when connecting to Consul's API
- Consul Port
- The port to use when connecting to Consul's API
- Consul ACL Token
- The Consul access token to use when registering the node with Consul (optional)
- Service Name
- The name of the service to register with Consul for automatic clustering
- Service Address
- An IP address or host name to use when registering the service. If this is specified, the value will automatically be appended to the service ID. This is useful when you are testing with a single Consul server instead of having an agent for every RabbitMQ node.(optional)
- Service Auto Address
- Use the hostname of the current machine (retrieved with `gethostname(2)`) for the service address when registering the service with Consul. If this is enabled, the hostname will automatically be appended to the service ID. This is useful when you are testing with a single Consul server instead of having an agent for every RabbitMQ node. (optional)
- Service Auto Address by NIC
- Use the IP address of the specified network interface controller (NIC) as the service address when registering with Consul. (optional)
- Service Port
- Used to set a port for the service in Consul, allowing for the automatic clustering service registration to double as a general RabbitMQ service registration.
Note: Set the
CONSUL_SVC_PORT
to an empty value to disable port announcement and health checking. For example:CONSUL_SVC_PORT=""
- Consul Use Longname
- When node names are registered with Consul, instead of FQDN's addresses, this option allows to append .node. to the node names retrieved from Consul.
- Consul Domain
- The domain suffix appended to peer node hostname when long node names are used (see above).
- Service TTL
- Used to specify the Consul health check interval that is used to let Consul know that RabbitMQ is alive an healthy.
- Service Tags
- Used to specify the Consul service tags. If a cluster name is specified, the tags specified here are added to the cluster name tag
- Service unregistration timeout
- How soon should Consul unregister a node that's failing its health check? The value is in second and cannot be lower than 60.
- Include nodes that fail Consul health checks?
- If set to `true`, nodes that fail their health checks with Consul will still be included into discovery results.
- The DNS Round-Robin A Record. Imagine having 3 nodes with IPS 10.0.0.2, 10.0.0.3 and 10.0.0.4
- All the nodes have reverse lookup entries in your DNS server. You should get something similar to this:
- Erlang will always receive lowercase DNS names so be careful if you use your /etc/hosts file to resolve the other nodes in the cluster and you use uppercase there as RabbitMQ will get confused and the cluster will not form
- etcd Scheme
- The URI scheme to use when connecting to etcd
- etcd Host
- The hostname to use when connecting to etcd's API
- etcd Port
- The port to connect to when using to etcd's API
- etcd Key Prefix
- The prefix used when storing cluster membership keys in etcd
- etcd Node TTL
- Used to specify how long a node can be down before it is removed from etcd's list of RabbitMQ nodes in the cluster
- K8S Scheme
- The URI scheme to use when connecting to Kubernetes API server
- K8S Host
- The hostname of the kubernetes API server
- K8S Port
- The port ot use when connecting to kubernetes API server
- K8S Token Path
- The token path of the Pod's service account
- K8S Cert Path
- The path of the service account authentication certificate with the k8s API server
- K8S Namespace Path
- The path of the service account namespace file
- K8S Service Name
- The rabbitmq service name in Kubernetes
- K8S Adddress Type
- The address type, either ip or hostname
- K8S Hostname Suffix
- The suffix to append to the hostname
- erlang 17.5
- docker-machine
- docker-compose
- make
tests
run-broker
shell
dist
You are able to configure autocluster plugin via Environment Variables or in the rabbitmq.config file.
Note: RabbitMQ reads its own config file with environment variables - rabbitmq-env.conf
, but you can't easily reuse it for autocluster
configuration. If you absolutely want to do it, you should use export VAR_NAME=var_value
instead of a plain assignment to VAR_NAME
.
The following chart details each general setting, with the environment variable name, rabbitmq.config
setting key and data type, and the default value if there is one.
Setting | Environment Variable | Setting Key | Type | Default |
---|---|---|---|---|
Backend Type | AUTOCLUSTER_TYPE |
backend |
atom |
unconfigured |
Startup Delay | AUTOCLUSTER_DELAY |
startup_delay |
integer |
5 |
Failure Mode | AUTOCLUSTER_FAILURE |
autocluster_failure |
atom |
ignore |
Log Level | AUTOCLUSTER_LOG_LEVEL |
autocluster_log_level |
atom |
info |
Longname | RABBITMQ_USE_LONGNAME |
bool |
false |
|
Node Name | RABBITMQ_NODENAME |
string |
rabbit@$HOSTNAME |
|
Node Type | RABBITMQ_NODE_TYPE |
node_type |
atom |
disc |
Cluster Cleanup | AUTOCLUSTER_CLEANUP |
cluster_cleanup |
bool |
false |
Cleanup Interval | CLEANUP_INTERVAL |
cleanup_interval |
integer |
60 |
Cleanup Warn Only | CLEANUP_WARN_ONLY |
cleanup_warn_only |
bool |
true |
To configure logging level used by this plugin, use the
AUTOCLUSTER_LOG_LEVEL
environment variable or
autocluster.autocluster_log_level
setting.
Here's a very minimalistic example that enables debug logging:
[
{autocluster, [
{autocluster_log_level, debug}
]}
].
Valid log levels are debug
, info
, warning
, and
error
. For more information on RabbitMQ configuration please refer to RabbitMQ documentation.
The AWS backend for the autocluster supports two different node discovery, Autoscaling Group membership and EC2 tags.
The following settings impact the behavior of the AWS backend. See the AWS API Credentials section below for additional settings.
NOTE: If this is your first time setting up RabbitMQ with the autoscaling cluster and are doing so for R&D purposes, you may want to check out the gavinmroy/alpine-rabbitmq-autocluster Docker Image repository for a working example of the plugin using a CloudFormation template that creates everything required for an Autoscaling Group based cluster.
Environment Variable | Setting Key | Type | Default |
---|---|---|---|
AWS_AUTOSCALING |
aws_autoscaling |
atom |
false |
AWS_EC2_TAGS |
aws_ec2_tags |
[string()] |
|
AWS_USE_PRIVATE_IP |
aws_use_private_ip |
atom |
false |
Notes '''''
If aws_autoscaling
is enabled, the EC2 backend will dynamically determine the autoscaling group that the node is a member of and attempt to join the other nodes in the autoscaling group.
If aws_autoscaling
is disabled, you must specify EC2 tags to use to filter the nodes that the backend should cluster with.
As with the AWS CLI, the autocluster
plugin configures the AWS API requests by attempting to resolve the values in a number of steps.
The configuration values are discovered in the following order:
The credentials values are discovered in the following order:
The following settings and environment variables impact the configuration and credentials behavior. For more information see the Amazon AWS CLI documentation.
Environment Variable | Setting Key | Type | Default |
---|---|---|---|
AWS_ACCESS_KEY_ID |
aws_access_key |
string |
|
AWS_SECRET_ACCESS_KEY |
aws_secret_key |
string |
|
AWS_DEFAULT_REGION |
aws_ec2_region |
string |
us-east-1 |
AWS_DEFAULT_PROFILE |
N/A | string |
|
AWS_CONFIG_FILE |
N/A | string |
|
AWS_SHARED_CREDENTIALS_FILE |
N/A | string |
If you intend to use the EC2 Instance Metadata Service along with an IAM Role that is assigned to EC2 instances, you will need a policy that allows the plugin to discover the node list. The following is an example of such a policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingInstances",
"ec2:DescribeInstances"
],
"Resource": [
"*"
]
}
]
}
The following configuration example enables the autoscaling based cluster discovery and sets the EC2 region to us-west-2
:
[
{autocluster, [
{autocluster_log_level, debug},
{backend, aws},
{aws_autoscaling, true},
{aws_ec2_region, "us-west-2"}
]}
].
For non-autoscaling group based clusters, the following configuration demonstrates how to limit EC2 instances in the cluster to nodes with the tags region=us-west-2
and service=rabbitmq
. It also specifies the AWS access key and AWS secret key.
[
{autocluster, [
{autocluster_log_level, debug},
{backend, aws},
{aws_ec2_tags, [
{"region", "us-west-2"},
{"service", "rabbitmq"}
]},
{aws_ec2_region, "us-east-1"},
{aws_access_key, "AKIDEXAMPLE"},
{aws_secret_key, "wJalrXUtnFEMI/K7MDENG+bPxRfiCYEXAMPLEKEY"}
]}
].
When using environment variables, the tags must be provided in JSON format:
AWS_EC2_TAGS="{\"region\": \"us-west-2\",\"service\": \"rabbitmq\"}"
The following is an example cloud-init that was tested with Ubuntu Trusty for use with an Autoscaling Group:
#cloud-config
apt_update: true
apt_upgrade: true
apt_sources:
- source: deb https://apt.dockerproject.org/repo ubuntu-trusty main
keyid: 58118E89F3A912897C070ADBF76221572C52609D
filename: docker.list
packages:
- docker-engine
runcmd:
- docker run -d --name rabbitmq --net=host -p 4369:4369 -p 5672:5672 -p 15672:15672 -p 25672:25672 gavinmroy/rabbitmq-autocluster
The following settings impact the configuration of the Consul backend for the autocluster plugin:
Setting | Environment Variable | Setting Key | Type | Default |
---|---|---|---|---|
Consul Scheme | CONSUL_SCHEME |
consul_scheme |
string |
http |
Consul Host | CONSUL_HOST |
consul_host |
string |
localhost |
Consul Port | CONSUL_PORT |
consul_port |
integer |
8500 |
Consul ACL Token | CONSUL_ACL_TOKEN |
consul_acl_token |
string |
|
Service Name | CONSUL_SVC |
consul_svc |
string |
rabbitmq |
Service Address | CONSUL_SVC_ADDR |
consul_svc_addr |
string |
|
Service Auto Address | CONSUL_SVC_ADDR_AUTO |
consul_svc_addr_auto |
boolean |
false |
Service Auto Address by NIC | CONSUL_SVC_ADDR_NIC |
consul_svc_addr_nic |
string |
|
Service Port | CONSUL_SVC_PORT |
consul_svc_port |
integer |
5672 |
Service TTL | CONSUL_SVC_TTL |
consul_svc_ttl |
integer |
30 |
Service Tags | CONSUL_SVC_TAGS |
consul_svc_tags |
list |
[] |
Service unregistration timeout | CONSUL_DEREGISTER_AFTER |
consul_deregister_after |
integer |
60 |
Consul Use Longname | CONSUL_USE_LONGNAME |
consul_use_longname |
boolean |
false |
Consul Domain | CONSUL_DOMAIN |
consul_domain |
string |
consul |
Include nodes that fail Consul health checks? | CONSUL_INCLUDE_NODES_WITH_WARNINGS |
consul_include_nodes_with_warnings |
boolean |
false |
An example that configures an ACL token and contacts a local Consul agent:
[
{rabbit, []},
{autocluster, [
{backend, consul},
{consul_host, "localhost"},
{consul_port, 8500},
{consul_acl_token, "example-acl-token"},
{consul_svc, "rabbitmq-test"},
{cluster_name, "test"}
]}
].
The following example can be used to for a cluster of N nodes, one
running on a development machine (my-laptop.local
) and N - 1 running
in VMs or containers with access to host networking.
Node names will be [email protected]
, [email protected]
,
and [email protected]
.
[
{rabbit, []},
{autocluster, [
{backend, consul},
{consul_host, "my-laptop.local"},
{consul_port, 8500},
{consul_use_longname, true},
{consul_svc, "rabbitmq"},
{consul_svc_addr_auto, true},
{consul_svc_addr_nodename, true}
]}
].
In the following example, the service address reported to Consul is hardcoded
to hostname1.local
instead of being computed automatically from the environment:
[
{rabbit, []},
{autocluster, [
{backend, consul},
{consul_host, "my-laptop.local"},
{consul_port, 8500},
{consul_use_longname, true},
{consul_svc, "rabbitmq"},
{consul_svc_addr_auto, false},
{consul_svc_addr, "hostname1.messaging.dev.local"}
]}
].
The example demonstrates how to create a dynamic RabbitMQ cluster using:
The following setting applies only to the DNS backend:
DNS Hostname
The FQDN to use when the backend type is dns
for looking up the RabbitMQ nodes to cluster
via a DNS A record round-robin.
Environment Variable | AUTOCLUSTER_HOST |
---|---|
Setting Key | autocluster_host |
Data type | string |
Default Value | consul |
The following configuration example enables the DNS based cluster discovery and sets the autocluster_host variable to your DNS Round-Robin A record:
[
{autocluster, [
{backend, dns},
{autocluster_host, "YOUR_ROUND_ROBIN_A_RECORD"}
]}
].
If you are having issues getting your RabbitMQ cluster formed, please check that Erlang can resolve:
> inet_res:lookup("YOUR_ROUND_ROBIN_A_RECORD", in, a).
[{10,0,0,2},{10,0,0,3},{10,0,0,4}]
> inet_res:gethostbyaddr({10,0,0,2}).
{ok,{hostent,"YOUR_REVERSE_LOOKUP_ENTRY",[],
inet,4,
[{10,0,0,2}]}}
The following settings apply to the etcd backend only:
Setting | Environment Variable | Setting Key | Type | Default |
---|---|---|---|---|
etcd Scheme | ETCD_SCHEME |
etcd_scheme |
list |
http |
etcd Host | ETCD_HOST |
etcd_host |
list |
localhost |
etcd Port | ETCD_PORT |
etcd_port |
int |
2379 |
etcd Key Prefix | ETCD_PREFIX |
etcd_prefix |
list |
rabbitmq |
etcd Node TTL | ETCD_TTL |
etcd_ttl |
integer |
30 |
NOTE The etcd backend supports etcd v2 and v3.
The following settings impact the configuration of the Kubernetes backend for the autocluster plugin:
Setting | Environment Variable | Setting Key | Type | Default |
---|---|---|---|---|
K8S Scheme | K8S_SCHEME |
k8s_scheme |
string |
https |
K8S Host | K8S_HOST |
k8s_host |
string |
kubernetes.default.svc.cluster.local |
K8S Port | K8S_PORT |
k8s_port |
integer |
443 |
K8S Token Path | K8S_TOKEN_PATH |
k8s_token_path |
string |
/var/run/secrets/kubernetes.io/serviceaccount/token |
K8S Cert Path | K8S_CERT_PATH |
k8s_cert_path |
string |
/var/run/secrets/kubernetes.io/serviceaccount/ca.crt |
K8S Namespace Path | K8S_NAMESPACE_PATH |
k8s_namespace_path |
string |
/var/run/secrets/kubernetes.io/serviceaccount/namespace |
K8S Service Name | K8S_SERVICE_NAME |
k8s_service_name |
string |
rabbitmq |
K8S Adddress Type | K8S_ADDRESS_TYPE |
k8s_address_type |
string |
ip |
K8S Hostname Suffix | K8S_HOSTNAME_SUFFIX |
k8s_hostname_suffix |
string |
In order for this plugin to work, your nodes need to use FQDN. i.e. set RABBITMQ_USE_LONGNAME=true
in your pod
WIP Notes for dev environment
Startup docker-machine:
docker-machine create --driver virtualbox default
eval $(docker-machine env)
Start client containers:
docker-compose up -d
Work in Progress
Building the container:
docker build -t rabbitmq-autocluster .
Here's the base pattern for how I test against Consul when developing:
make dist
docker build -t rabbitmq-autocluster .
docker network create rabbitmq_network
docker run --rm -t -i --net=rabbitmq_network --name=consul -p 8500:8500 consul
docker run --rm -t -i --net=rabbitmq_network --name=node0 -e AUTOCLUSTER_TYPE=consul -e CONSUL_HOST=consul -e CONSUL_PORT=8500 -e CONSUL_SERVICE_TTL=60 -e AUTOCLUSTER_CLEANUP=true -e CLEANUP_WARN_ONLY=false -e CONSUL_SVC_ADDR_AUTO=true -p 15672:15672 rabbitmq-autocluster
docker run --rm -t -i --net=rabbitmq_network --name=node1 -e RABBITMQ_NODE_TYPE=ram -e AUTOCLUSTER_TYPE=consul -e CONSUL_HOST=consul -e CONSUL_PORT=8500 -e CONSUL_SERVICE_TTL=60 -e AUTOCLUSTER_CLEANUP=true -e CLEANUP_WARN_ONLY=false -e CONSUL_SVC_ADDR_AUTO=true rabbitmq-autocluster
docker run --rm -t -i --net=rabbitmq_network --name=node2 -e RABBITMQ_NODE_TYPE=ram -e AUTOCLUSTER_TYPE=consul -e CONSUL_HOST=consul -e CONSUL_PORT=8500 -e CONSUL_SERVICE_TTL=60 -e AUTOCLUSTER_CLEANUP=true -e CLEANUP_WARN_ONLY=false -e CONSUL_SVC_ADDR_AUTO=true rabbitmq-autocluster
- Consul managent: http://localhost:8500/ui
- RabbitMQ cluster: http://localhost:15672/
BSD 3-Clause