-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consul Connect enabled jobs fail if using health check #7709
Comments
Hey @spuder thanks for reporting, and sorry you're having trouble with this. Rather than us trying to debug from bits of your configuration, do you mind starting from some examples and working backwards to figure out what's going wrong? This configuration is working with # example.nomad
job "example" {
datacenters = ["dc1"]
group "api" {
network {
mode = "bridge"
port "healthcheck" {
to = -1
}
}
service {
name = "count-api"
port = "9001"
connect {
sidecar_service {}
}
check {
name = "api-health"
type = "http"
port = "healthcheck"
path = "/health"
interval = "10s"
timeout = "3s"
expose = true
}
}
task "web" {
driver = "docker"
config {
image = "hashicorpnomad/counter-api:v1"
}
}
}
group "dashboard" {
network {
mode = "bridge"
port "http" {
static = 9002
to = 9002
}
port "healthcheck" {
to = -1
}
}
service {
name = "count-dashboard"
port = "9002"
connect {
sidecar_service {
proxy {
upstreams {
destination_name = "count-api"
local_bind_port = 8080
}
}
}
}
check {
name = "dashboard-health"
type = "http"
port = "healthcheck"
path = "/health"
interval = "10s"
timeout = "3s"
expose = true
}
}
task "dashboard" {
driver = "docker"
env {
COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
}
config {
image = "hashicorpnomad/counter-dashboard:v1"
}
}
}
} Running) $ consul agent -dev $ sudo nomad agent -dev-connect $ nomad job run example.nomad Check Nomad) $ nomad job status example
ID = example
Name = example
Submit Date = 2020-04-13T16:41:44-06:00
Type = service
Priority = 50
Datacenters = dc1
Namespace = default
Status = running
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
api 0 0 1 0 0 0
dashboard 0 0 1 0 0 0
Latest Deployment
ID = 9c22115e
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
api 1 1 1 0 2020-04-13T16:51:58-06:00
dashboard 1 1 1 0 2020-04-13T16:52:04-06:00
Allocations
ID Node ID Task Group Version Desired Status Created Modified
e0f7ed62 c5839b0b api 0 run running 27s ago 14s ago
e7b3cd01 c5839b0b dashboard 0 run running 27s ago 7s ago Checking Consul) $ curl -s localhost:8500/v1/agent/checks | jq '.[] | select(.Name=="dashboard-health")'
{
"Node": "NUC10",
"CheckID": "_nomad-check-5794a0c4287f9d66c4a5450586f7410b33a6bd3f",
"Name": "dashboard-health",
"Status": "passing",
"Notes": "",
"Output": "HTTP GET http://192.168.1.53:25646/health: 200 OK Output: Hello, you've hit /health\n",
"ServiceID": "_nomad-task-e7b3cd01-3d24-a3f1-7841-ad897586fe0f-group-dashboard-count-dashboard-9002",
"ServiceName": "count-dashboard",
"ServiceTags": [],
"Type": "http",
"Definition": {},
"CreateIndex": 0,
"ModifyIndex": 0
} $ curl -s localhost:8500/v1/agent/checks | jq '.[] | select(.Name=="api-health")'
{
"Node": "NUC10",
"CheckID": "_nomad-check-aab24708f3160bd44748d8b8f0a85b8c6e5ceb16",
"Name": "api-health",
"Status": "passing",
"Notes": "",
"Output": "HTTP GET http://192.168.1.53:21128/health: 200 OK Output: Hello, you've hit /health\n",
"ServiceID": "_nomad-task-e0f7ed62-a523-0544-75ca-2a41402a2c93-group-api-count-api-9001",
"ServiceName": "count-api",
"ServiceTags": [],
"Type": "http",
"Definition": {},
"CreateIndex": 0,
"ModifyIndex": 0
} Checking Dashboard) $ curl -s -w '%{response_code}\n' localhost:9002 -o /dev/null
200 |
Likewise, I get similar successfull results using the underlying job "example" {
datacenters = ["dc1"]
group "api" {
network {
mode = "bridge"
port "healthcheck" {
to = -1
}
}
service {
name = "count-api"
port = "9001"
connect {
sidecar_service {
proxy {
expose {
path {
path = "/health"
protocol = "http"
local_path_port = 9001
listener_port = "healthcheck"
}
}
}
}
}
check {
name = "api-health"
type = "http"
port = "healthcheck"
path = "/health"
interval = "10s"
timeout = "3s"
}
}
task "web" {
driver = "docker"
config {
image = "hashicorpnomad/counter-api:v1"
}
}
}
group "dashboard" {
network {
mode = "bridge"
port "http" {
static = 9002
to = 9002
}
port "healthcheck" {
to = -1
}
}
service {
name = "count-dashboard"
port = "9002"
connect {
sidecar_service {
proxy {
upstreams {
destination_name = "count-api"
local_bind_port = 8080
}
expose {
path {
path = "/health"
protocol = "http"
local_path_port = 9002
listener_port = "healthcheck"
}
}
}
}
}
check {
name = "dashboard-health"
type = "http"
port = "healthcheck"
path = "/health"
interval = "10s"
timeout = "3s"
}
}
task "dashboard" {
driver = "docker"
env {
COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
}
config {
image = "hashicorpnomad/counter-dashboard:v1"
}
}
}
} |
I figured it out. The name attribute is required on both the service and the check. If you only put the name on one or the other, consul connect will fail without any errors.
Possible remediations
|
Both of these sound like good suggestions, @spuder |
Possibly related to #7221: if the |
Hi, I use this example. Differences: no need to register dynamic port job "countdash" {
datacenters = ["dc1"]
group "api" {
network {
mode = "bridge"
}
service {
name = "count-api"
port = "9001"
connect {
sidecar_service {}
}
check {
expose = true
name = "api-alive"
type = "http"
path = "/health"
interval = "10s"
timeout = "2s"
}
}
task "web" {
driver = "docker"
config {
image = "hashicorpnomad/counter-api:v1"
}
}
}
group "dashboard" {
network {
mode ="bridge"
port "http" {
static = 9002
to = 9002
}
}
service {
name = "count-dashboard"
port = "9002"
connect {
sidecar_service {
proxy {
upstreams {
destination_name = "count-api"
local_bind_port = 8080
}
}
}
}
check {
expose = true
name = "dashboard-alive"
type = "http"
path = "/health"
interval = "10s"
timeout = "2s"
}
}
task "dashboard" {
driver = "docker"
env {
COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
}
config {
image = "hashicorpnomad/counter-dashboard:v1"
}
}
}
}
Another example with minio job "minio" {
type = "service"
datacenters = ["dc1"]
namespace = "default"
group "s3" {
network {
mode = "bridge"
}
service {
name = "minio"
port = 9000
# https://docs.min.io/docs/minio-monitoring-guide.html
check {
expose = true
name = "minio-live"
type = "http"
path = "/minio/health/live"
interval = "10s"
timeout = "2s"
}
check {
expose = true
name = "minio-ready"
type = "http"
path = "/minio/health/ready"
interval = "15s"
timeout = "4s"
}
connect {
sidecar_service {
}
}
}
task "server" {
driver = "docker"
config {
image = "minio/minio:latest"
memory_hard_limit = 2048
args = [
"server",
"/local/data",
"-address",
"127.0.0.1:9000"
]
}
resources {
cpu = 200
memory = 1024
}
}
}
}
|
These aren't required anymore, I think in recent versions of Consul
Using a network port label for a service port that will be fronted by a Connect sidecar is probably not what you intended - the [*] you could do something like
and reference the |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad = 0.11.0
Consul = 1.7.2
ACL's = Enabled
Envoy = 1.13
Issue
Consul connect enabled jobs fail to connect in envoy if a health check is defined (even if the health check passes). Consul connect enable jobs work as expected if no health check is defined on the service.
Consul connect in nomad is new, and others have had trouble as reported here:
Setup
I have a connect enabled job in nomad named
bar
. I have a legacy vm calledfoo
running ubuntu 18.04 with envoy installed.The vm running 'foo' has the following in
/etc/consul/service_foobar.json
Envoy has been started with the following command
The following nomad job works correctly (note that it does not have a health check). The vm
foo
is able to communicate with the nomad job running on port 3000 through envoyI'm not sure if this is a bug, or a documentation issue. Here are all the configurations that I have tried:
Attempt 1
This works and is able to communicate over the envoy proxy service mesh, but there is no health check.Result: ✅
Attempt 2
Result: ❌
Attempt 3
Result: ❌
Attempt 4
network { mode = "bridge" port "http" { to = "3000" } } service { name = "bar" port = "http" address_mode = "driver" check { port = "http" type = "http" path = "/" interval = "5s" timeout = "2s" address_mode = "driver" } connect { sidecar_service {} } }
Result: ❌
Attempt 5
Set port http `to = -1` ``` group "group" { count = 1 network { mode = "bridge" port "http" { to = -1 } } service { name = "bar" port = "3000" check { port = "http" type = "http" path = "/actuator/health" interval = "5s" timeout = "2s" } connect { sidecar_service {} } } ```Result: ❌
Attempt 6
Use `expose = true` as mentioned in this issue https://github.com//issues/7556 ``` network { mode = "bridge" port "http" { to = -1 } } service { name = "bar" port = "3000" check { port = "http" type = "http" path = "/actuator/health" expose = true interval = "5s" timeout = "2s" } connect { sidecar_service { } } } ```Result: ❌
The text was updated successfully, but these errors were encountered: