Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul services drop periodically #22427

Closed
liuchenrang opened this issue May 31, 2024 · 4 comments
Closed

Consul services drop periodically #22427

liuchenrang opened this issue May 31, 2024 · 4 comments

Comments

@liuchenrang
Copy link

liuchenrang commented May 31, 2024

Nomad version

Output from nomad version
v1.7.7

Operating system and Environment details

stream-8
SSH_CONNECTION=198.19.249.3 60379 198.19.249.13 22
LANG=zh_CN.UTF-8
HISTCONTROL=ignoredups
HOSTNAME=centos-8-3
which_declare=declare -f
XDG_SESSION_ID=c32
USER=root
PWD=/root
HOME=/root
SSH_CLIENT=198.19.249.3 60379 22
SSH_TTY=/dev/pts/2
MAIL=/var/spool/mail/root
TERM=xterm-256color
SHELL=/bin/bash
SHLVL=1
LOGNAME=root
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/0/bus
XDG_RUNTIME_DIR=/run/user/0
PATH=/opt/orbstack-guest/bin-hiprio:/opt/orbstack-guest/data/bin/cmdlinks:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/orbstack-guest/bin:/root/bin
DEBUGINFOD_URLS=https://debuginfod.centos.org/
HISTSIZE=1000
LESSOPEN=||/usr/bin/lesspipe.sh %s
BASH_FUNC_which%%=() {  ( alias;
 eval ${which_declare} ) | /usr/bin/which --tty-only --read-alias --read-functions --show-tilde --show-dot $@
}
_=/usr/bin/env

Issue

In cluster mode, both client and server are enabled, and the deployment type is service.
The service will be unregistered and automatically registered later

Reproduction steps

1 cluster mode 3 sets.
2 on a single node, client and server are enabled at the same time.
3 deploy 3 sets of whoami
4 consul cluster

Expected Result

Three whoami are stable online.

Actual Result

Periodically drop the line, and soon come online again.

Job file (if appropriate)

job "web" {
  datacenters = ["dc1"]
  type        = "service"
  # constraint {
  #   attribute = "${attr.unique.hostname}"
  #   value     = "centos-8-3"
  # }

  meta {
    stream = "3"
  }

  group "app" {
    count = 3
    network {
      port "http" {
        to = 8181
      }
    }

    service {
      tags = ["urlprefix-/"]
      port = "http"
      name = "hhhhlll"
      check {
        name     = "AppWebCheck"
        type     = "http"
        port     = "http"
        path     = "/"
        interval = "15s"
        timeout  = "20s"
      }
    }
    task "hiweb" {
      driver = "docker"
      config {
        image = "xinghuo/hi:20240530"
        ports = ["http"]
      }
    }
  }
}

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

(edited by @tgross to make syntax legible)

@liuchenrang
Copy link
Author

I analyze the code in service_client.go, in this code.
Func (c * ServiceClient) sync.
4 detections will be skipped, resulting in deletion.
The first place.
Check whether the service is recorded locally, and the program remarks are as follows.
/ / Known service, skip.
The second place.
Code is if! isNomadService (id) | |! c.isClientAgent.
Since cleint=true is turned on, this is also logical.
Finally, the cancellation service deletion logic is reached.
Question!
If you understand that nomad provides check services, your app does not need to interface with consul service registration and discovery. If you do so, you should have no problem.
Here I want to use the official check function, so there is a problem with feedback.

@liuchenrang
Copy link
Author

If it's bug, I think the reason is.
This code c.services.
This is a local service list, but the last traversal used all the service lists registered in consul, resulting in the deletion of services managed by other nodes, and mutual deletion due to cluster mode!

@tgross
Copy link
Member

tgross commented Jun 21, 2024

Hi @liuchenrang! Does each one of your Nomad client agents have its own Consul agent? They should not share Consul agents, as described in the Consul configuration docs:

An important requirement is that each Nomad agent talks to a unique Consul agent. Nomad agents should be configured to talk to Consul agents and not Consul servers. If you are observing flapping services, you may have multiple Nomad agents talking to the same Consul agent. As such avoid configuring Nomad to talk to Consul via DNS such as consul.service.consul

@tgross tgross self-assigned this Jun 21, 2024
@tgross tgross changed the title Using version 1.7.7, the service is dropped periodically. Consul services drop periodically Jun 21, 2024
@tgross tgross closed this as not planned Won't fix, can't repro, duplicate, stale Jul 26, 2024
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Development

No branches or pull requests

2 participants