-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Charm temporarily goes to Error
during deployment while handling config-changed
, but eventually proceeds to Active
#337
Comments
The copy of the log: https://pastebin.canonical.com/p/WxDhkGWwkp/ (runner will be cleaned during the holiday). |
I stumbled upon this error when running integration tests in my project After ssh'ing into the runners and running
|
Hi @RafalSiwek , what you are seeing is a symptom of the mysql container being OOMKilled, hence the pebble connection being lost. I was checking your tests and saw that your bundle does not set any options for mysql-k8s charm. The default behavior for memory allocation is somewhat grabby, which for testing on GH runners will most probably lead to unstable outcome. So, my advice is either to have a dedicated @@ -8,6 +8,8 @@
scale: 1
trust: true
_github_repo_name: mysql-k8s-operator
+ options:
+ - profile: testing
zenml-server:
charm: zenml-server
channel: latest/edge The testing profile is very light weight, tuned for this use case |
And by the way, it's encouraged to set limits using either k8s constraints or profiles on production deployments. |
Hi @paulomach, thanks for the suggestions! As the issue persisted, I decided to conduct more tests, including setting the profile to "testing." Unfortunately, the problem persisted (you can review my testing path), resulting in the Now, with evidence suggesting that the Pebble socket connection error is causing the charm to enter this hook error limbo state when calling the |
I believe this is due a |
Note that it keeps failing consistently for us (kubeflow team) as well. You can see all 3 integration tests that deploy mysql-k8s have failed with the same error while we have constraints set. |
Hi @orfeas-k , I believe we got to the root cause now and tests are running for the PR. |
Great, thanks for the update. Let us know when the fix is published! |
@orfeas-k @ca-scribner @RafalSiwek fix published on Please give it a try to confirm and close this issue |
Tested and works! Thank you for implementing this. It would be great if you could let us know when this fix is published to P.S. @paulomach I don't seem to have permission to close this issue. |
Thank you. We will let you know when promoted to stable. |
Since the the issue [1] was merged and those changes are already in 8.0/stable we can use that channel again. fixes: #401 --- [1]: canonical/mysql-k8s-operator#337
Since the the issue [1] was merged and those changes are already in 8.0/stable we can use that channel again. fixes: #401 --- [1]: canonical/mysql-k8s-operator#337
Since the the issue [1] was merged and those changes are already in 8.0/stable we can use that channel again. fixes: #401 --- [1]: canonical/mysql-k8s-operator#337
Since the the issue [1] was merged and those changes are already in 8.0/stable we can use that channel again. fixes: #401 --- [1]: canonical/mysql-k8s-operator#337
Steps to reproduce
This might be reproducable in a smaller test suite, but I've only encountered it during the kfp-api charm's integration tests (
tox -e integration -- --model kubeflow
). I tried reproducing it by deploying only mysql-k8s, but did not see the same issue. Maybe there's a cpu-load driven race condition?Expected behavior
The charm progresses to
Active
without encounteringError
Actual behavior
The charm hits
Error
during a config-changed hook while deploying. Over time, this error goes away and the charm proceeds to be ActiveWhile the charm does eventually get
Active
, going intoError
on the way feels like there's a foreseeable failure that's uncaught. Plus, in my use case, it resulted in flakey CI because we typically label any Error state as a test fail and end the test right away,Versions
Operating system:
Juju CLI: 3.1.6
Juju agent: 3.1.6
Charm revision: 8.0/edge
microk8s: 1.25-strict
Log output
kubectl pod logs from a local run
``` kubectl logs mysql-k8s-0 -c charm [3:41:27] 2023-11-17T20:40:57.676Z [pebble] HTTP API server listening on ":38812". 2023-11-17T20:40:57.676Z [pebble] Started daemon. 2023-11-17T20:40:57.682Z [pebble] POST /v1/services 4.454317ms 202 2023-11-17T20:40:57.682Z [pebble] Started default services with change 1. 2023-11-17T20:40:57.685Z [pebble] Service "container-agent" starting: /charm/bin/containeragent unit --data-dir /var/lib/juju --append-env "PATH=$PATH:/charm/bin" --show-log --charm-modified-version 0 2023-11-17T20:40:57.746Z [container-agent] 2023-11-17 20:40:57 INFO juju.cmd supercommand.go:56 running containerAgent [3.1.6 f6a66aa91eec620f5ac04a19d8c06bef03ae6228 gc go1.20.8] 2023-11-17T20:40:57.747Z [container-agent] starting containeragent unit command 2023-11-17T20:40:57.747Z [container-agent] containeragent unit "unit-mysql-k8s-0" start (3.1.6 [gc]) 2023-11-17T20:40:57.747Z [container-agent] 2023-11-17 20:40:57 INFO juju.cmd.containeragent.unit runner.go:578 start "unit" 2023-11-17T20:40:57.747Z [container-agent] 2023-11-17 20:40:57 INFO juju.worker.upgradesteps worker.go:60 upgrade steps for 3.1.6 have already been run. 2023-11-17T20:40:57.749Z [container-agent] 2023-11-17 20:40:57 INFO juju.worker.probehttpserver server.go:157 starting http server on 127.0.0.1:65301 2023-11-17T20:40:57.775Z [container-agent] 2023-11-17 20:40:57 INFO juju.api apiclient.go:1073 cannot resolve "controller-service.controller-local-microk8s.svc.cluster.local": lookup controller-service.controller-local-microk8s.svc.cluster.local: operation was canceled 2023-11-17T20:40:57.775Z [container-agent] 2023-11-17 20:40:57 INFO juju.api apiclient.go:706 connection established to "wss://10.152.183.27:17070/model/ef974c79-cbe5-4b53-81ca-d316c305a28d/api" 2023-11-17T20:40:57.779Z [container-agent] 2023-11-17 20:40:57 INFO juju.worker.apicaller connect.go:163 [ef974c] "unit-mysql-k8s-0" successfully connected to "10.152.183.27:17070" 2023-11-17T20:40:57.801Z [container-agent] 2023-11-17 20:40:57 INFO juju.api apiclient.go:1073 cannot resolve "controller-service.controller-local-microk8s.svc.cluster.local": lookup controller-service.controller-local-microk8s.svc.cluster.local: operation was canceled 2023-11-17T20:40:57.802Z [container-agent] 2023-11-17 20:40:57 INFO juju.api apiclient.go:706 connection established to "wss://10.152.183.27:17070/model/ef974c79-cbe5-4b53-81ca-d316c305a28d/api" 2023-11-17T20:40:57.805Z [container-agent] 2023-11-17 20:40:57 INFO juju.worker.apicaller connect.go:163 [ef974c] "unit-mysql-k8s-0" successfully connected to "10.152.183.27:17070" 2023-11-17T20:40:57.829Z [container-agent] 2023-11-17 20:40:57 INFO juju.worker.migrationminion worker.go:142 migration phase is now: NONE 2023-11-17T20:40:57.829Z [container-agent] 2023-11-17 20:40:57 INFO juju.worker.logger logger.go:120 logger worker started 2023-11-17T20:40:57.834Z [container-agent] 2023-11-17 20:40:57 WARNING juju.worker.proxyupdater proxyupdater.go:241 unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: "" 2023-11-17T20:40:57.843Z [container-agent] 2023-11-17 20:40:57 INFO juju.worker.leadership tracker.go:194 mysql-k8s/0 promoted to leadership of mysql-k8s 2023-11-17T20:40:57.849Z [container-agent] 2023-11-17 20:40:57 INFO juju.worker.caasupgrader upgrader.go:113 abort check blocked until version event received 2023-11-17T20:40:57.849Z [container-agent] 2023-11-17 20:40:57 INFO juju.worker.caasupgrader upgrader.go:119 unblocking abort check 2023-11-17T20:40:57.856Z [container-agent] 2023-11-17 20:40:57 INFO juju.agent.tools symlinks.go:20 ensure jujuc symlinks in /var/lib/juju/tools/unit-mysql-k8s-0 2023-11-17T20:40:57.876Z [container-agent] 2023-11-17 20:40:57 INFO juju.worker.uniter uniter.go:363 unit "mysql-k8s/0" started 2023-11-17T20:40:57.879Z [container-agent] 2023-11-17 20:40:57 INFO juju.worker.uniter uniter.go:689 resuming charm install 2023-11-17T20:40:57.881Z [container-agent] 2023-11-17 20:40:57 INFO juju.worker.uniter.charm bundles.go:81 downloading ch:amd64/jammy/mysql-k8s-110 from API server 2023-11-17T20:40:57.881Z [container-agent] 2023-11-17 20:40:57 INFO juju.downloader download.go:109 downloading from ch:amd64/jammy/mysql-k8s-110 2023-11-17T20:40:58.063Z [container-agent] 2023-11-17 20:40:58 INFO juju.downloader download.go:92 download complete ("ch:amd64/jammy/mysql-k8s-110") 2023-11-17T20:40:58.187Z [container-agent] 2023-11-17 20:40:58 INFO juju.downloader download.go:172 download verified ("ch:amd64/jammy/mysql-k8s-110") 2023-11-17T20:41:07.684Z [pebble] Check "readiness" failure 1 (threshold 3): received non-20x status code 418 2023-11-17T20:41:08.652Z [container-agent] 2023-11-17 20:41:08 INFO juju.worker.uniter uniter.go:389 hooks are retried true 2023-11-17T20:41:08.740Z [container-agent] 2023-11-17 20:41:08 INFO juju.worker.uniter resolver.go:165 found queued "install" hook 2023-11-17T20:41:10.564Z [container-agent] 2023-11-17 20:41:10 INFO juju-log Running legacy hooks/install. 2023-11-17T20:41:12.196Z [container-agent] 2023-11-17 20:41:12 INFO juju.worker.uniter.operation runhook.go:186 ran "install" hook (via hook dispatching script: dispatch) 2023-11-17T20:41:13.549Z [container-agent] 2023-11-17 20:41:13 INFO juju.worker.uniter.operation runhook.go:186 ran "restart-relation-created" hook (via hook dispatching script: dispatch) 2023-11-17T20:41:14.699Z [container-agent] 2023-11-17 20:41:14 INFO juju.worker.uniter.operation runhook.go:186 ran "upgrade-relation-created" hook (via hook dispatching script: dispatch) 2023-11-17T20:41:15.817Z [container-agent] 2023-11-17 20:41:15 INFO juju.worker.uniter.operation runhook.go:186 ran "database-peers-relation-created" hook (via hook dispatching script: dispatch) 2023-11-17T20:41:15.840Z [container-agent] 2023-11-17 20:41:15 INFO juju.worker.uniter resolver.go:165 found queued "leader-elected" hook 2023-11-17T20:41:17.509Z [container-agent] 2023-11-17 20:41:17 INFO juju.worker.uniter.operation runhook.go:186 ran "leader-elected" hook (via hook dispatching script: dispatch) 2023-11-17T20:41:17.679Z [pebble] Check "readiness" failure 2 (threshold 3): received non-20x status code 418 2023-11-17T20:41:18.961Z [container-agent] 2023-11-17 20:41:18 INFO juju.worker.uniter.operation runhook.go:186 ran "database-storage-attached" hook (via hook dispatching script: dispatch) 2023-11-17T20:41:19.746Z [container-agent] 2023-11-17 20:41:19 ERROR juju-log Uncaught exception while in charm code: 2023-11-17T20:41:19.746Z [container-agent] Traceback (most recent call last): 2023-11-17T20:41:19.746Z [container-agent] File "/usr/lib/python3.10/urllib/request.py", line 1348, in do_open 2023-11-17T20:41:19.746Z [container-agent] h.request(req.get_method(), req.selector, req.data, headers, 2023-11-17T20:41:19.746Z [container-agent] File "/usr/lib/python3.10/http/client.py", line 1283, in request 2023-11-17T20:41:19.746Z [container-agent] self._send_request(method, url, body, headers, encode_chunked) 2023-11-17T20:41:19.746Z [container-agent] File "/usr/lib/python3.10/http/client.py", line 1329, in _send_request 2023-11-17T20:41:19.746Z [container-agent] self.endheaders(body, encode_chunked=encode_chunked) 2023-11-17T20:41:19.746Z [container-agent] File "/usr/lib/python3.10/http/client.py", line 1278, in endheaders 2023-11-17T20:41:19.746Z [container-agent] self._send_output(message_body, encode_chunked=encode_chunked) 2023-11-17T20:41:19.746Z [container-agent] File "/usr/lib/python3.10/http/client.py", line 1038, in _send_output 2023-11-17T20:41:19.746Z [container-agent] self.send(msg) 2023-11-17T20:41:19.746Z [container-agent] File "/usr/lib/python3.10/http/client.py", line 976, in send 2023-11-17T20:41:19.746Z [container-agent] self.connect() 2023-11-17T20:41:19.746Z [container-agent] File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/pebble.py", line 272, in connect 2023-11-17T20:41:19.746Z [container-agent] self.sock.connect(self.socket_path) 2023-11-17T20:41:19.746Z [container-agent] FileNotFoundError: [Errno 2] No such file or directory 2023-11-17T20:41:19.746Z [container-agent] 2023-11-17T20:41:19.746Z [container-agent] During handling of the above exception, another exception occurred: 2023-11-17T20:41:19.746Z [container-agent] 2023-11-17T20:41:19.746Z [container-agent] Traceback (most recent call last): 2023-11-17T20:41:19.746Z [container-agent] File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/pebble.py", line 1599, in _request_raw 2023-11-17T20:41:19.746Z [container-agent] response = self.opener.open(request, timeout=self.timeout) 2023-11-17T20:41:19.746Z [container-agent] File "/usr/lib/python3.10/urllib/request.py", line 519, in open 2023-11-17T20:41:19.746Z [container-agent] response = self._open(req, data) 2023-11-17T20:41:19.746Z [container-agent] File "/usr/lib/python3.10/urllib/request.py", line 536, in _open 2023-11-17T20:41:19.746Z [container-agent] result = self._call_chain(self.handle_open, protocol, protocol + 2023-11-17T20:41:19.746Z [container-agent] File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain 2023-11-17T20:41:19.746Z [container-agent] result = func(*args) 2023-11-17T20:41:19.746Z [container-agent] File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/pebble.py", line 286, in http_open 2023-11-17T20:41:19.746Z [container-agent] return self.do_open(_UnixSocketConnection, req, # type:ignore 2023-11-17T20:41:19.746Z [container-agent] File "/usr/lib/python3.10/urllib/request.py", line 1351, in do_open 2023-11-17T20:41:19.746Z [container-agent] raise URLError(err) 2023-11-17T20:41:19.746Z [container-agent] urllib.error.URLError: 2023-11-17T20:41:19.746Z [container-agent] 2023-11-17T20:41:19.746Z [container-agent] During handling of the above exception, another exception occurred: 2023-11-17T20:41:19.746Z [container-agent] 2023-11-17T20:41:19.746Z [container-agent] Traceback (most recent call last): 2023-11-17T20:41:19.746Z [container-agent] File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/./src/charm.py", line 786, in 2023-11-17T20:41:19.746Z [container-agent] main(MySQLOperatorCharm) 2023-11-17T20:41:19.746Z [container-agent] File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/main.py", line 441, in main 2023-11-17T20:41:19.746Z [container-agent] _emit_charm_event(charm, dispatcher.event_name) 2023-11-17T20:41:19.746Z [container-agent] File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/main.py", line 149, in _emit_charm_event 2023-11-17T20:41:19.746Z [container-agent] event_to_emit.emit(*args, **kwargs) 2023-11-17T20:41:19.746Z [container-agent] File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/framework.py", line 342, in emit 2023-11-17T20:41:19.746Z [container-agent] framework._emit(event) 2023-11-17T20:41:19.746Z [container-agent] File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/framework.py", line 839, in _emit 2023-11-17T20:41:19.746Z [container-agent] self._reemit(event_path) 2023-11-17T20:41:19.746Z [container-agent] File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/framework.py", line 928, in _reemit 2023-11-17T20:41:19.746Z [container-agent] custom_handler(event) 2023-11-17T20:41:19.746Z [container-agent] File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/./src/charm.py", line 412, in _on_config_changed 2023-11-17T20:41:19.746Z [container-agent] if not self._mysql.is_mysqld_running(): 2023-11-17T20:41:19.746Z [container-agent] File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/src/mysql_k8s_helpers.py", line 564, in is_mysqld_running 2023-11-17T20:41:19.746Z [container-agent] return self.container.exists(MYSQLD_SOCK_FILE) 2023-11-17T20:41:19.746Z [container-agent] File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/model.py", line 2444, in exists 2023-11-17T20:41:19.746Z [container-agent] self._pebble.list_files(str(path), itself=True) 2023-11-17T20:41:19.746Z [container-agent] File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/pebble.py", line 2104, in list_files 2023-11-17T20:41:19.746Z [container-agent] resp = self._request('GET', '/v1/files', query) 2023-11-17T20:41:19.746Z [container-agent] File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/pebble.py", line 1564, in _request 2023-11-17T20:41:19.746Z [container-agent] response = self._request_raw(method, path, query, headers, data) 2023-11-17T20:41:19.746Z [container-agent] File "/var/lib/juju/agents/unit-mysql-k8s-0/charm/venv/ops/pebble.py", line 1612, in _request_raw 2023-11-17T20:41:19.746Z [container-agent] raise ConnectionError(e.reason) 2023-11-17T20:41:19.746Z [container-agent] ops.pebble.ConnectionError: [Errno 2] No such file or directory 2023-11-17T20:41:20.037Z [container-agent] 2023-11-17 20:41:20 ERROR juju.worker.uniter.operation runhook.go:180 hook "config-changed" (via hook dispatching script: dispatch) failed: exit status 1 2023-11-17T20:41:20.039Z [container-agent] 2023-11-17 20:41:20 INFO juju.worker.uniter resolver.go:161 awaiting error resolution for "config-changed" hook 2023-11-17T20:41:20.379Z [container-agent] 2023-11-17 20:41:20 INFO juju.worker.uniter resolver.go:161 awaiting error resolution for "config-changed" hook 2023-11-17T20:41:21.659Z [container-agent] 2023-11-17 20:41:21 INFO juju.worker.uniter resolver.go:161 awaiting error resolution for "config-changed" hook 2023-11-17T20:41:25.050Z [container-agent] 2023-11-17 20:41:25 INFO juju.worker.uniter resolver.go:161 awaiting error resolution for "config-changed" hook 2023-11-17T20:41:26.123Z [container-agent] 2023-11-17 20:41:26 INFO juju.worker.uniter.operation runhook.go:186 ran "config-changed" hook (via hook dispatching script: dispatch) 2023-11-17T20:41:26.134Z [container-agent] 2023-11-17 20:41:26 INFO juju.worker.uniter resolver.go:165 found queued "start" hook 2023-11-17T20:41:26.859Z [container-agent] 2023-11-17 20:41:26 INFO juju-log Running legacy hooks/start. 2023-11-17T20:41:27.679Z [pebble] Check "readiness" failure 3 (threshold 3): received non-20x status code 418 2023-11-17T20:41:27.679Z [pebble] Check "readiness" failure threshold 3 hit, triggering action 2023-11-17T20:41:27.996Z [container-agent] 2023-11-17 20:41:27 INFO juju.worker.uniter.operation runhook.go:186 ran "start" hook (via hook dispatching script: dispatch) 2023-11-17T20:41:28.768Z [container-agent] 2023-11-17 20:41:28 INFO juju-log Setting up the logrotate configurations 2023-11-17T20:41:36.436Z [container-agent] 2023-11-17 20:41:36 INFO juju-log Configuring instance ```github runner test failure for same issue
The text was updated successfully, but these errors were encountered: