Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error filling agent cache: Internal cache failure #8657

Open
saez0pub opened this issue Sep 11, 2020 · 1 comment
Open

error filling agent cache: Internal cache failure #8657

saez0pub opened this issue Sep 11, 2020 · 1 comment
Labels
theme/consul-nomad Consul & Nomad shared usability theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics

Comments

@saez0pub
Copy link

saez0pub commented Sep 11, 2020

Overview of the Issue

Hello,

I have a nomad job with a connect enabled backend.
sometimes, It produces an error log in consul agent :

error filling agent cache: Internal cache failure: service '_nomad-task-XXX' not in agent state

Reproduction Steps

Here is an easy reproduction method

$ nomad -version
Nomad v0.12.4 (8efaee4ba5e9727ab323aaba2ac91c2d7b572d84)
$ consul -version  
Consul v1.8.3
Revision a9322b9c7
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

Create the example nomad jobs

cat <<EOF > backend.nomad
job "countdash" {
  datacenters = ["dc1"]

  group "api" {
    network {
      mode = "bridge"
    }

    service {
      name = "count-api"
      port = "9001"

      connect {
        sidecar_service {}
      }
    }

    task "web" {
      driver = "docker"

      config {
        image = "hashicorpnomad/counter-api:v1"
      }
    }
  }
}
EOF

cat <<EOF > batch_with_connect.nomad
job "batch_with_connect" {
  datacenters = ["dc1"]
  type = "batch"
  periodic {
    cron = "*/15 * * * * * *"
    prohibit_overlap = true
  }
  group "batch_with_connect" {
    network {
      mode = "bridge"
    }
    service {
      name = "count-dashboard"
      port = "9002"

      connect {
        sidecar_service {
          proxy {
            upstreams {
              destination_name = "count-api"
              local_bind_port  = 8080
            }
          }
        }
      }
    }
    task "echo" {
      driver = "docker"
      config {
        image = "debian:10"
        args = ["echo", "done"]
      }
    }
  }
}
EOF

Start consul

consul agent -dev -log-level=TRACE 2>&1 | tee consul.log

Start nomad

sudo nomad agent -dev-connect -log-level=TRACE 2>&1 | tee nomad.log

Start backend job on nomad (wait it is fully started)

$ nomad job run backend.nomad

$ [ -z "$(curl -s http://127.0.0.1:8500/v1/agent/checks | grep Status  | grep -v passing)" ] && echo OK
OK

Start batch job on nomad

$ nomad job run batch_with_connect.nomad

Wait an error in consul logs (job launches every 15 seconds)

grep "ERROR.*cache" consul.log

Consul info for both Client and Server

Consul info
agent:
	check_monitors = 0
	check_ttls = 0
	checks = 6
	services = 6
build:
	prerelease = 
	revision = a9322b9c
	version = 1.8.3
consul:
	acl = disabled
	bootstrap = false
	known_datacenters = 1
	leader = true
	leader_addr = 127.0.0.1:8300
	server = true
raft:
	applied_index = 44
	commit_index = 44
	fsm_pending = 0
	last_contact = 0
	last_log_index = 44
	last_log_term = 2
	last_snapshot_index = 0
	last_snapshot_term = 0
	latest_configuration = [{Suffrage:Voter ID:506cf337-31df-4815-01b0-90606fc638f6 Address:127.0.0.1:8300}]
	latest_configuration_index = 0
	num_peers = 0
	protocol_version = 3
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Leader
	term = 2
runtime:
	arch = amd64
	cpu_count = 8
	goroutines = 123
	max_procs = 8
	os = linux
	version = go1.14.7
serf_lan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 1
	event_time = 2
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1
serf_wan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1

Operating system and Environment details

Linux

Log Fragments

Logs extract

    2020-09-11T09:59:31.292+0200 [TRACE] agent.proxycfg: resetting watches for discovery chain: service_id=_nomad-task-4f11511e-782f-27e0-1480-c64812a66525-group-batch_with_connect-count-dashboard-9002-sidecar-proxy id=count-api
    2020-09-11T09:59:31.292+0200 [TRACE] agent.proxycfg: initializing watch of target: service_id=_nomad-task-4f11511e-782f-27e0-1480-c64812a66525-group-batch_with_connect-count-dashboard-9002-sidecar-proxy upstream=count-api chain=count-api target=count-api.default.dc1 mesh-gateway-mode=
    2020-09-11T09:59:31.341+0200 [DEBUG] agent.http: Request finished: method=GET url=/v1/agent/service/_nomad-task-4f11511e-782f-27e0-1480-c64812a66525-group-batch_with_connect-count-dashboard-9002-sidecar-proxy from=127.0.0.1:36134 latency=203.509µs
    2020-09-11T09:59:32.223+0200 [DEBUG] agent: removed check: check=service:_nomad-task-4f11511e-782f-27e0-1480-c64812a66525-group-batch_with_connect-count-dashboard-9002-sidecar-proxy:1
    2020-09-11T09:59:32.223+0200 [ERROR] agent.proxycfg: watch error: service_id=_nomad-task-4f11511e-782f-27e0-1480-c64812a66525-group-batch_with_connect-count-dashboard-9002-sidecar-proxy id=service-http-checks:_nomad-task-4f11511e-782f-27e0-1480-c64812a66525-group-batch_with_connect-count-dashboard-9002 error="error filling agent cache: Internal cache failure: service '_nomad-task-4f11511e-782f-27e0-1480-c64812a66525-group-batch_with_connect-count-dashboard-9002' not in agent state"
    2020-09-11T09:59:32.223+0200 [DEBUG] agent: removed check: check=service:_nomad-task-4f11511e-782f-27e0-1480-c64812a66525-group-batch_with_connect-count-dashboard-9002-sidecar-proxy:2
    2020-09-11T09:59:32.223+0200 [DEBUG] agent: removed service: service=_nomad-task-4f11511e-782f-27e0-1480-c64812a66525-group-batch_with_connect-count-dashboard-9002-sidecar-proxy
    2020-09-11T09:59:32.223+0200 [ERROR] agent.proxycfg: watch error: service_id=_nomad-task-4f11511e-782f-27e0-1480-c64812a66525-group-batch_with_connect-count-dashboard-9002-sidecar-proxy id=service-http-checks:_nomad-task-4f11511e-782f-27e0-1480-c64812a66525-group-batch_with_connect-count-dashboard-9002 error="error filling agent cache: Internal cache failure: service '_nomad-task-4f11511e-782f-27e0-1480-c64812a66525-group-batch_with_connect-count-dashboard-9002' not in agent state"
    2020-09-11T09:59:32.223+0200 [INFO]  agent: Deregistered service: service=_nomad-task-4f11511e-782f-27e0-1480-c64812a66525-group-batch_with_connect-count-dashboard-9002-sidecar-proxy

Full logs

@jsosulska jsosulska added theme/consul-nomad Consul & Nomad shared usability theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics labels Sep 17, 2020
@shoenig
Copy link
Member

shoenig commented Nov 10, 2020

Looks like another user is reporting similar symptoms over at hashicorp/nomad#9307

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/consul-nomad Consul & Nomad shared usability theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics
Projects
None yet
Development

No branches or pull requests

3 participants