Kube-App: Monitoring Kubernetes Metrics

Overview

This application consists of two sub-applications; REST API and Web UI, which both are exposed on the same port.

REST API: HTTP API of application backend. Accessed via /api in path.
Web UI: Web application to display main metrics/measurements of kubernetes. Accessed via /web in path.

Purposes

This application aims to:

Create trusted and reliable system to accurately monitor jobs' resource utilization (CPU,Memory,GPU) in each containers' across a Kubernetes cluster.
Provide an interface between metrics database and EPFL's SIFAC billing system.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

1. Dependencies

To run the application for development, you need to install minimum dependencies, listed on requirements.txt file. To install, execute:

./install-dependencies

2. Environment Variables

Export several environment variables before running the application:

# Configurations to connect and read to and from influxdb instance
export INFLUXDB_HOST="<your_influxdb_instance_hostname>"
export INFLUXDB_PORT="<your_influxdb_instance_port>"
export INFLUXDB_USER="<your_influxdb_account_username>"
export INFLUXDB_PASS="<your_influxdb_account_password>"
export INFLUXDB_DB="<your_influxdb_db_name>"

# Configurations for API authentication
export API_USERNAME="<username>"
export API_PASSWORD="<password>"

3. Logging

Create the default logging directory of the application and make it writable for user program.

sudo mkdir -p /logs/kube-monitoring
sudo chown $(id -u):$(id -g) /logs/kube-monitoring

4. Starting Application

To run, simply execute the start script to start the application and pass the desired port number as an argument.

# Example
./start-server 8000

Metrics

Note that there are main metrics and available metrics.

Main metrics : Metrics/measurement that count for accounting utilization (usage) by pods. Basically, the utilization/usage rate of compute resources will be measured based on these metrics.
Available metrics: Metrics/measurement that available for query both in API and Web.

Main Metrics for API & Web

Metric Name	Description
cpu/usage_rate	CPU usage on all cores in millicores.
gpu/usage	The usage of all gpus in megabytes.
memory/usage	Total memory usage in bytes.

Available Metrics for API and Web

Metric Name	Description
cpu/limit	CPU hard limit in millicores.
cpu/node_capacity	Cpu capacity of a node.
cpu/node_allocatable	Cpu allocatable of a node.
cpu/node_reservation	Share of cpu that is reserved on the node allocatable.
cpu/node_utilization	CPU utilization as a share of node allocatable.
cpu/request	CPU request (the guaranteed amount of resources) in millicores.
cpu/usage	Cumulative CPU usage on all cores.
cpu/usage_rate	CPU usage on all cores in millicores.
filesystem/usage	Total number of bytes consumed on a filesystem.
filesystem/limit	The total size of filesystem in bytes.
filesystem/available	The number of available bytes remaining in a the filesystem
filesystem/inodes	The number of available inodes in a the filesystem
filesystem/inodes_free	The number of free inodes remaining in a the filesystem
gpu/usage	The usage on all gpus in Megabytes
memory/limit	Memory hard limit in bytes.
memory/major_page_faults	Number of major page faults.
memory/major_page_faults_rate	Number of major page faults per second.
memory/node_capacity	Memory capacity of a node.
memory/node_allocatable	Memory allocatable of a node.
memory/node_reservation	Share of memory that is reserved on the node allocatable.
memory/node_utilization	Memory utilization as a share of memory allocatable.
memory/page_faults	Number of page faults.
memory/page_faults_rate	Number of page faults per second.
memory/request	Memory request (the guaranteed amount of resources) in bytes.
memory/usage	Total memory usage.
memory/cache	Cache memory usage.
memory/rss	RSS memory usage.
memory/working_set	Total working set usage. Working set is the memory being used and not easily dropped by the kernel.
network/rx	Cumulative number of bytes received over the network.
network/rx_errors	Cumulative number of errors while receiving over the network.
network/rx_errors_rate	Number of errors while receiving over the network per second.
network/rx_rate	Number of bytes received over the network per second.
network/tx	Cumulative number of bytes sent over the network
network/tx_errors	Cumulative number of errors while sending over the network
network/tx_errors_rate	Number of errors while sending over the network
network/tx_rate	Number of bytes sent over the network per second.
uptime	Number of milliseconds since the container was started.

The Application

For both API and WEB, it is possible to discover all available metrics by specifying which metric in URL path for each request. However, you have to type manually in the web application as non-main kubernetes metrics are not really concerned in this project.

Project Structure

├── app
│   ├── api             : Directory containing all views, and urls settings for API.
│   ├── kube-monitoring : Directory containing Django project main settings, urls configuration, custom logging spec.
│   ├── web             : Directory containing All methods, views, and urls settings for Web.
│   ├── manage.py       : Script to manage administrative settings of the project.

Logging

For both API and web, the application will write its logging to info.log and errors.log in /logs/kube-monitoring. To trace either debug, info or error in application:

tail -f /logs/kube-monitoring/info.log
# or
tail -f /logs/kube-monitoring/errors.log

API

The API service is available in /api/[metrics] path and works only with HTTP POST. Any other HTTP method will be rejected. Alongside the HTTP request, X-USERNAME and X-PASSWORD must be presented in HTTP request headers. The contents of these headers should match API_USERNAME and API_PASSWORD that are exported in environment.

The details of HTTP requests:

HTTP Method: POST
HTTP Path: /api/[metrics], for example /api/cpu/usage_rate
HTTP Headers:
- X-USERNAME : username for the HTTP request
- X-PASSWORD : password for the HTTP request
Request body:
- namespace : the specific namespace (virtual cluster) in kubernetes cluster. Default is "default" namespace
- limit : the desired number of how much the datapoints should be returned to client. Default is 100 datapoints
- agg : the type of aggregation needed. Default is None, options are: pod-by-namespace, all-namespace, all-pod.
- timeBeginInterval : The starting time range (inclusive) of query. Default is 60 minutes from the current time of query
- timeEndInterval : The ending time range (exclusive) of query. Default is the current time of query

Using the API

There are four ways of using the HTTP API according to provided aggregation schemes:

Without aggregation
To perform request without aggegration, simply set agg in HTTP request body into other values or left it empty.
Aggregation by pods per namespace
Set agg in HTTP request body into pod-by-namespace. Limit will be ignored.
Aggegation by namespace
Set agg in HTTP request body into all-namespace. Limit and namespace will be ignored.
Aggeration by pods
Set agg in HTTP request body into all-pod. Limit and namespace will be ignored.

Example using curl

curl -X POST \
  http://<app_hostname>:<app_port>/api/cpu/usage_rate \
  -H 'x-password: <password>' \
  -H 'x-username: <username>' \
  -d '{
  "agg":"pod-by-namespace",
  "timeBeginInterval": "2017-12-01",
  "timeEndInterval": "2017-12-31"
}'

This example will result the aggregated metrics from 1 December 2017 (00:00 AM) until 30 December 2017 (23:59 PM).

Web

It is pretty straightforward to use the web. Login into the web and navigate using the main left sidebar to view main metrics of kubernetes monitoring.

Advanced Usage - Filter by namespace on the Web

However, it is possible to filter metrics based on namespace. To do so add /[namespace] at the end of /web/stats/[metrics]. For example if you want to highlight namespace "cluster-3" for its cpu/usage_rate, simply hit on browser url the following: <hostname>:<port>/web/stats/gpu/usage_rate/cluster-3

Creating User for the Web

To add user account for accessing the web, execute the following in the terminal:

python app/manage.py createuser --username=<username> --email=<email>

Note that you will be prompted the password afterwards.

Deployment into Production

It is encouraged to deploy the application in the Kubernetes cluster as well, as the application is stateless and act only as an interface into influxdb instance.

Building Docker Image

Build the image using make command:

make build IMAGE_NAME="<image_name>" VERSION="<version>"

Push to dockerhub:

make push IMAGE_NAME="<image_name>" VERSION="<version>"

Or build and push at once. For example:

# Example build and push command with default image name and version
make all IMAGE_NAME=sanadhis/kube-monitoring VERSION=0.1

Deploying to Kubernetes

First, fill in the environment variables in kube-monitoring.yaml; INFLUXDB_HOST, INFLUXDB_PORT, INFLUXDB_USER, INFLUXDB_PASS, INFLUXDB_DB, API_USERNAME, API_PASSWORD.

Ensure that you use the corresponding image name and version in kube-monitoring.yaml. Then, simply execute:

kubectl create -f kube-monitoring.yaml

Avoid Committing YAML with Credentials

To avoid accidental commit of your account credentials information, ignore kube-monitoring.yaml by executing:

git update-index --assume-unchanged kube-monitoring.yaml

Accessing Services

If you look closely, the kube-monitoring deployment comes with Grafana v4.4.3. Thus you can access the grafana and the application itself once you successfully deployed the YAML file.

To find the port of application (default by NodePort):

kubectl get svc | grep kube-monitoring

Test the application or grafana:

curl <hostname>:<application_port>/api
curl <hostname>:<application_port>/web
curl <hostname>:<grafana_port>

Miscellaneous

Built With

Django - The web framework used

Versioning

Version 1.0 : 1st January 2018

Authors

Sanadhi Sutandi @sanadhis

License

This project is licensed under the Apache 2.0 License - see the LICENSE.md file for details

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kube-App: Monitoring Kubernetes Metrics

Overview

Purposes

Getting Started

1. Dependencies

2. Environment Variables

3. Logging

4. Starting Application

Metrics

Main Metrics for API & Web

Available Metrics for API and Web

The Application

Project Structure

Logging

API

Using the API

Example using curl

Web

Advanced Usage - Filter by namespace on the Web

Creating User for the Web

Deployment into Production

Building Docker Image

Deploying to Kubernetes

Avoid Committing YAML with Credentials

Accessing Services

Miscellaneous

Built With

Versioning

Authors

License

Acknowledgments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
app		app
heapster-scripts		heapster-scripts
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
install-dependencies		install-dependencies
kube-monitoring.yaml		kube-monitoring.yaml
requirements.txt		requirements.txt
start-server		start-server

License

EPFL-IC/kube_monitoring

Folders and files

Latest commit

History

Repository files navigation

Kube-App: Monitoring Kubernetes Metrics

Overview

Purposes

Getting Started

1. Dependencies

2. Environment Variables

3. Logging

4. Starting Application

Metrics

Main Metrics for API & Web

Available Metrics for API and Web

The Application

Project Structure

Logging

API

Using the API

Example using curl

Web

Advanced Usage - Filter by namespace on the Web

Creating User for the Web

Deployment into Production

Building Docker Image

Deploying to Kubernetes

Avoid Committing YAML with Credentials

Accessing Services

Miscellaneous

Built With

Versioning

Authors

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages