-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
监控数据断点异常 #2459
Comments
采集 dcgm 的频率是多少?另外去即时查询里查询断点的指标,使用范围查询(比如 |
input.dcgm/exporter.toml
|
./categraf --test --debug --inputs dcgm
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Question and Steps to reproduce
版本: v7.7.2
架构:n9e(三节点 高可用 VIP) + victoria(单点vminsert,vmselect,vmstorage) + categraf 数据库三节点galera,redis三节点哨兵
客户端节点:62台 后续增加至300台左右
监控要求 GPU和基本信息 syslog对接
目前62台gpu dcgm采集监控有很多节点都是断点或无数据 是网络原因还是资源不够导致的
Relevant logs and configurations
Version
版本: v7.7.2
The text was updated successfully, but these errors were encountered: