Skip to content

Cisco UCS traffic monitoring using Grafana, InfluxDB and Telegraf

License

Notifications You must be signed in to change notification settings

zimmerx/ucs_traffic_monitor

 
 

Repository files navigation

UCS Traffic Monitoring (UTM)

Full-blown traffic monitoring of Cisco UCS servers using Grafana, InfluxDB and Telegraf.

Locations Dashboard enter image description here

UCS Domains Overview enter image description here

Top 10 ports, service profiles, etc. enter image description here

Load Balance verification and root cause enter image description here

Congestion Monitoring and detection enter image description here

End-to-end mapping from vHBA/vNIC to FI uplink Port enter image description here

Integrated documentation with conceptual drawing and detailed explanations enter image description here

and much more...

Installation

  • Tested OS: CentOS 7.x. Should work on other OS also.
  • Python version: Version 3 only. Should be able to work on Python 2 also with minor modification.

Two options:

  • DIY Installation: Self install the required pacakges
  • OVA - Required packages are pre-installed on CentOS 7.6 OVA

DIY Installation

  1. Install Telegraf
  2. Install InfluxDB
  3. Install Grafana. Install following plugins:
    1. Flowchart
    2. Pie Chart
    3. ePict panel
    4. multistat
  4. Install following Python modules
    1. Cisco UCSM Python SDK
    2. netmiko library

OVA installation

Download OVA from releases page. This is a CentOS 7.6 based OVA. Deployment is same as any other OVA that you have deployed before. Click here for detailed installation instructions of the UTM OVA.

Upgrades

You are responsible to upgrade Grafana, InfluxDB, Telegraf, Python and other packages. Generally, the upgrade is simple with one or two commands. Please refer to respective packages for upgrade process. Please keep an the on security vulnerabilities and fixes.

Configuration

ucs_traffic_monitor.py fetches metrics from Cisco UCS and stitches them. This file is invoked by telegraf exec input plugin every 60 seconds. Login credentials of UCS should be available in ucs_domains_group*.txt.

Try

$ python3 /usr/local/telegraf/ucs_traffic_monitor.py -h

if you are running this for the first time.

Change/Add to your telegraf.conf file as below

[[inputs.exec]]
   interval = "60s"
   commands = [
       "python3 /usr/local/telegraf/ucs_traffic_monitor.py /usr/local/telegraf/ucs_domains.txt influxdb-lp -vv",
   ]
   timeout = "50s"
   data_format = "influx"

also update the global values like

  logfile = "/var/log/telegraf/telegraf.log"
  logfile_rotation_max_size = "10MB"
  logfile_rotation_max_archives = 5

This should be able to

  1. Pull metrics from UCS every 60 seconds
  2. Stitch them end-to-end between FI uplink ports and vNIC/vHBA on blade servers
  3. Write the data to InfluxDB

Import the dashboards into Grafana. You should have it running.

For detailed steps-by-step instructions, especially if you do not have prior experience with Grafana, InfluxDB and Telegraf, check out: Cisco UCS monitoring using Grafana, InfluxDB, Telegraf – UTM Installation

Credits

  • My wife (Dimple) and kids (Manan and Kiara) while I took away precious weekend hours from you and invested in the development of UTM.
  • Folks in the Cisco UCS business unit and TAC, who knowingly or unknowingly helped me to build UTM and also for awesome content on ciscolive.com.
  • Colleagues and friends in Cisco (Art, Craig, Eugene, Mark and a long list of people) for the inspiration.
  • End-users/customers: Philipe, Jason, Shawn, Ryan and others for your great feedback.

About

Cisco UCS traffic monitoring using Grafana, InfluxDB and Telegraf

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 89.6%
  • Shell 10.4%