Full-blown traffic monitoring of Cisco UCS servers using Grafana, InfluxDB and Telegraf.
Top 10 ports, service profiles, etc.
Load Balance verification and root cause
Congestion Monitoring and detection
End-to-end mapping from vHBA/vNIC to FI uplink Port
Integrated documentation with conceptual drawing and detailed explanations
and much more...
- Data source: Cisco UCS Manager (UCSM), read-only account is enough
- Data receiver: Telegraf
- Data storage: InfluxDB, a time-series database
- Visualization: Grafana
- Tested OS: CentOS 7.x. Should work on other OS also.
- Python version: Version 3 only. Should be able to work on Python 2 also with minor modification.
Two options:
- DIY Installation: Self install the required pacakges
- OVA - Required packages are pre-installed on CentOS 7.6 OVA
- Install Telegraf
- Install InfluxDB
- Install Grafana. Install following plugins:
- Flowchart
- Pie Chart
- ePict panel
- multistat
- Install following Python modules
- Cisco UCSM Python SDK
- netmiko library
Download OVA from releases page. This is a CentOS 7.6 based OVA. Deployment is same as any other OVA that you have deployed before. Click here for detailed installation instructions of the UTM OVA.
You are responsible to upgrade Grafana, InfluxDB, Telegraf, Python and other packages. Generally, the upgrade is simple with one or two commands. Please refer to respective packages for upgrade process. Please keep an the on security vulnerabilities and fixes.
ucs_traffic_monitor.py fetches metrics from Cisco UCS and stitches them. This file is invoked by telegraf exec input plugin every 60 seconds. Login credentials of UCS should be available in ucs_domains_group*.txt.
Try
$ python3 /usr/local/telegraf/ucs_traffic_monitor.py -h
if you are running this for the first time.
Change/Add to your telegraf.conf file as below
[[inputs.exec]]
interval = "60s"
commands = [
"python3 /usr/local/telegraf/ucs_traffic_monitor.py /usr/local/telegraf/ucs_domains.txt influxdb-lp -vv",
]
timeout = "50s"
data_format = "influx"
also update the global values like
logfile = "/var/log/telegraf/telegraf.log"
logfile_rotation_max_size = "10MB"
logfile_rotation_max_archives = 5
This should be able to
- Pull metrics from UCS every 60 seconds
- Stitch them end-to-end between FI uplink ports and vNIC/vHBA on blade servers
- Write the data to InfluxDB
Import the dashboards into Grafana. You should have it running.
For detailed steps-by-step instructions, especially if you do not have prior experience with Grafana, InfluxDB and Telegraf, check out: Cisco UCS monitoring using Grafana, InfluxDB, Telegraf – UTM Installation
- My wife (Dimple) and kids (Manan and Kiara) while I took away precious weekend hours from you and invested in the development of UTM.
- Folks in the Cisco UCS business unit and TAC, who knowingly or unknowingly helped me to build UTM and also for awesome content on ciscolive.com.
- Colleagues and friends in Cisco (Art, Craig, Eugene, Mark and a long list of people) for the inspiration.
- End-users/customers: Philipe, Jason, Shawn, Ryan and others for your great feedback.