Windows: out of of three node clusters stops at booting and does not join the cluster during cluster formation. #13126
-
Community Support Policy
RabbitMQ version used4.0.3 Erlang version used26.2.x Operating system (distribution) usedWindows How is RabbitMQ deployed?Windows installer rabbitmq-diagnostics status outputSee https://www.rabbitmq.com/docs/cli to learn how to use rabbitmq-diagnostics
Logs from node 1 (with sensitive values edited out)See https://www.rabbitmq.com/docs/logging to learn how to collect logs
Logs from node 2 (if applicable, with sensitive values edited out)See https://www.rabbitmq.com/docs/logging to learn how to collect logs
Logs from node 3 (if applicable, with sensitive values edited out)See https://www.rabbitmq.com/docs/logging to learn how to collect logs
rabbitmq.confSee https://www.rabbitmq.com/docs/configure#config-location to learn how to find rabbitmq.conf file location
Steps to deploy RabbitMQ clusterAdd an environment variable to install it for all users. rabbitmqctl stop_app Steps to reproduce the behavior in questionDon´t know why it failed or how to reproduce. advanced.configSee https://www.rabbitmq.com/docs/configure#config-location to learn how to find advanced.config file location
Application codeDont think this matters in this case but I got several C# service working against the RabbitMQ cluster. Kubernetes deployment fileNone What problem are you trying to solve?First off, we run RabbitMQ 3.13.0 (with Erlang 26.2.2), which was installed for the first time at the beginning of last year (2024). We missed that it needs to be updated more often, so we will schedule updates for this, and I will push for a Rabbit license. We are running RabbitMQ on three Windows Server nodes with a couple of quorum queues. This has been working fine for nearly a year now, but suddenly we noticed a large quantity of messages in a DLX queue as well as some errors from our services. RabbitMQ on node 2 had shut down and could not be restarted. We tried restarting the entire service, but that did not help. By running RabbitMQ in CMD, I could see that it freezes right at the start. I enabled more logging and found out that it gets stuck on something I believe is syncing Feature Flags with the cluster. It just loops rapidly and indefinitely. I reinstalled RabbitMQ and Erlang on node 1 several times, double-checked all settings, including features and add-ons, to ensure they match the rest of the cluster. There is no problem removing it from the cluster and then adding it back in, but when starting the RabbitMQ service, it always freezes. I'm beginning to think that the problem lies within the RabbitMQ cluster that is still running. Maybe it's a split-brain problem? We do regular restarts of the cluster, but we always leave one node operational. In this case, we probably want to turn everything off and then start it back up to hopefully re-sync the environment. However, it is very important not to lose any data. Please help us get back on track so we can get it running and update the environment. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
RabbitMQ v3.13.x is out of community support.
Time to upgrade to |
Beta Was this translation helpful? Give feedback.
-
For our team's own needs: the logs on node 1 stop right after the feature flag controller tries to use the global registry:
I could not find anything immediately related (that would not have to do strictly with logging, such as #12444 for 4.1.0) but there were feature flag-related changes in 4.0.x and will be more in 4.1.x. |
Beta Was this translation helpful? Give feedback.
RabbitMQ v3.13.x is out of community support.
3.13.0
is seven patch releases behind the3.13.x
series and 13 releases overall (behind4.0.5
).Time to upgrade to
4.0.5
.