-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
修改prometheus配置触发coordinator重建prometheus #61
Comments
完全重建是啥意思?当前版本加入了一个机制,就是Prometheus在滚动更新的时候,协调工作会先暂停。重建pod是否是因为你修改了StatefulSet? |
没有,只是修改了prometheus的config,再次测试了依然复现,cooridnator日志也打印第二次分配shard的过程:
|
同时发现删除prometheus的某个job配置后,series总数减少了,但是新建的prometheus个数却比删除前还多一个(原来4个,现在5个),查看某些prometheus,根本没有target。应该是coordiantor分配target还有问题 |
mark 我这也遇到了,只修改prometheus配置,prometheus的容器重建了,导致监控数据丢失 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
我想测试修改prometheus配置后,coordinator能否感知到并同步配置到prometheus;修改配置后发现prometheus完全重建了(先删除再新建),cooridnator日志如下:
time="2021-06-16T00:37:20Z" level=warning msg="Statefulset prometheus UpdatedReplicas != Replicas, skipped" component="shard manager"
level=info ts=2021-06-16T00:37:27.302Z caller=kubernetes.go:263 component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
W0616 00:37:27.302653 1 reflector.go:424] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:156: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
level=info ts=2021-06-16T00:37:27.304Z caller=kubernetes.go:263 component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
level=info ts=2021-06-16T00:37:27.306Z caller=kubernetes.go:263 component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
time="2021-06-16T00:37:30Z" level=info msg="need space 57074" component=coordinator
time="2021-06-16T00:37:30Z" level=info msg="change scale to 1" component="shard manager" sts=prometheus
time="2021-06-16T00:37:40Z" level=error msg="get targets status info from prometheus-0 failed, url = http://100.101.245.226:8080: http get: Get "http://100.101.245.226:8080/api/v1/shard/targets/\": dial tcp 100.101.245.226:8080: connect: connection refused" component=coordinator
time="2021-06-16T00:37:40Z" level=error msg="get runtime info from prometheus-0 failed : http get: Get "http://100.101.245.226:8080/api/v1/shard/runtimeinfo/\": dial tcp 100.101.245.226:8080: connect: connection refused" component=coordinator
time="2021-06-16T00:37:40Z" level=info msg="need space 57074" component=coordinator
time="2021-06-16T00:37:40Z" level=warning msg="shard group prometheus-0 is unHealth, skip apply change" component=coordinator
The text was updated successfully, but these errors were encountered: