This software is pre-production and should not be deployed to production servers.
Table of Contents
WCA project contains simple built-in dependency injection framework that allows to extend existing or add new functionalities.
This document contains examples of:
- simple
Runner
that outputs"Hello World!"
, - HTTP based
Storage
component to save metrics in external http based service, usingrequests
library.
To provide new functionality using external compoent operator of WCA has to:
- provide new component defined as Python class,
- register this Python class upon starting with extra command line
--register
parameter aspackage_name.module_name:class_name
) (package name is optional), - reference component name in configuration file (using name of class),
- make Python module accessible by Python interpreter for import (
PYTHONPATH
andPEX_INHERIT_PATH
environment variables)
In this document when referring to component, it means a simple Python class that was registered and by this allowed to be used in configuration file.
All WCA features (detection/CMS integration) are based on internal components and use the same mechanism for initialization.
From high-level standpoint, main entry point to application is only responsible for
instantiation of python classes defined in yaml configuration, then parsing and preparing logging infrastructure and then call generic run
method on already created Runner
instance.
Runner
class is a main vehicle integrating all other depended objects together.
For example, MeasurementRunner
is implements simple loop
that uses Node
subclass (e.g. MesosNode
) instance to discover locally running tasks, then collects metrics for those tasks
and then uses a Storage
subclass to store those metrics somewhere (e.g. KafkaStorage
or LogStorage
).
To illustrate that, when someone uses WCA with configuration file like this:
runner: !MeasurementRunner
node: !MesosNode # subclass of Node
metric_storage: !LogStorage # subclass of Storage
output_filename: /tmp/logs.txt
it effectively means running equivalent of Python code:
runner = MeasurementRunner(
node = MesosNode()
metric_storage = LogStorage(
output_filename = '/tmp/logs.txt')
)
runner.run()
For example, to provide measure-only mode, anomaly detection mode or resource allocation mode, WCA contains following components:
MeasurementRunner
that is only responsible for collecting metrics,DetectionRunner
that extendsMeasurementRunner
to allow anomaly detection and generate additional metrics,AllocationRunner
that allows to configure resources based on providedAllocator
component instance,
It is important to note, that configuration based objects (components) are static singletons available throughout whole application life and only accessible by parent objects.
Let's start with very basic thing and create HelloWorldRunner
that just outputs 'Hello world!' string.
With Python module hello_world_runner.py
containing HelloWorldRunner
subclass of Runner
:
from wca.runners import Runner
class HelloWorldRunner(Runner):
def run(self):
print('Hello world!')
you need to start WCA with following example config file:
runner: !HelloWorldRunner
and then with WCA started like this
PYTHONPATH=$PWD/examples PEX_INHERIT_PATH=fallback ./dist/wca.pex -c $PWD/configs/extending/hello_world.yaml -r hello_world_runner:HelloWorldRunner
Tip: | You can just copy-paste this command, all required example files are already in project, but you have to build pex file first with make . |
---|
should output:
Hello world!
To integrate with custom monitoring system it is enough to provide definition of custom Storage
class.
Storage
class is a simple interface that exposes just one method store
as defined below:
class Storage:
def store(self, metrics: List[Metric]) -> None:
"""store metrics; may throw FailedDeliveryException"""
...
where Metric is simple class with structure influenced by Prometheus metric model and OpenMetrics initiative :
@dataclass
class Metric:
name: str
value: float
labels: Dict[str, str]
type: str # gauge/counter
help: str
This is simple Storage
class that can be used to post metrics serialized as json to
external http web service using post method:
(full source code here)
import requests, json
from dataclasses import dataclass
from wca.storage import Storage
@dataclass
class HTTPStorage(Storage):
http_endpoint: str = 'http://127.0.0.1:8000'
def store(self, metrics):
requests.post(
self.http_endpoint,
json={metric.name: metric.value for metric in metrics}:w
)
then in can be used with MeasurementRunner
with following configuration file:
runner: !MeasurementRunner
config: !MeasurementRunnerConfig
node: !StaticNode
tasks: [] # this disables any tasks metrics
metrics_storage: !HTTPStorage
To be able to verify that data was posted to http service correctly please start naive service
using socat
:
socat - tcp4-listen:8000,fork
and then run WCA like this:
sudo env PYTHONPATH=$PWD/examples PEX_INHERIT_PATH=fallback ./dist/wca.pex -c $PWD/configs/extending/measurement_http_storage.yaml -r http_store:HTTPStorage --root --log http_storage:info
Expected output is:
# from WCA:
2019-06-14 21:51:17,862 INFO {MainThread} [http_storage] sending!
# from socat:
POST / HTTP/1.1
Host: 127.0.0.1:8000
User-Agent: python-requests/2.21.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
Content-Length: 240
Content-Type: application/json
{"wca_up": 1560541957.1652732, "wca_tasks": 0, "wca_memory_usage_bytes": 50159616,
"memory_usage": 1399689216, "cpu_usage_per_cpu": 1205557,
"wca_duration_seconds": 1.0013580322265625e-05,
"wca_duration_seconds_avg": 1.0013580322265625e-05}
Note:
- sudo is required to enable perf and resctrl based metrics,
- --log parameter allow to specify log level for custom components
Depending on Runner
component, different kinds of metrics are produced and send to different instances of Storage
components:
MeasurementRunner
usesStorage
instance undermetrics_storage
property to store:- platform level resources usage (CPU/memory usage) metrics,
- internal WCA metrics: number of monitored tasks, number of errors/warnings, health-checks, WCA memory usage,
- (per-task) perf system based metrics e.g. instructions, cycles
- (per-task) Intel RDT based metrics e.g. cache usage, memory bandwidth
- (per-task) cgroup based metrics e.g. CPU/memory usage
Each of those metrics has additional metadata attached (in form of labels) about:
- platform topology (sockets/cores/cpus),
extra labels
defined in WCA configuration file (e.g. own_ip),- labels to identify WCA version
wca_version
and host name (host
) and host CPU modelcpu_model
, - (only for per-task metrics) task id (
task_id
) and metadata acquired from orchestration system (Mesos task or Kubernetes pod labels)
DetectionRunner
usesStorage
subclass instances:in
metrics_storage
property:- the same metrics as send to
MeasurmentRunner
inmetrics_storage
above,
in
anomalies_storage
property:- number of anomalies detected by
Allcocator
class - individual instances of detected anomalies encoded as metrics (more details here)
- the same metrics as send to
AllocationRunner
usesStorage
subclass instances:in
metrics_storage
property:- the same metrics as send to
MeasurementRunner
inmetrics_storage
above,
in
anomalies_storage
property:- the same metrics as send to
DetectionRunner
inanomalies_storage
above,
in
alloation_storage
property:- number of resource allocations performed during last iteration,
- details about performed allocations like: number of CPU shares or CPU quota or cache allocation,
- more details here
- the same metrics as send to
Note that it is possible by using YAML anchors and aliases to configure that the same instance of Storage
should be used to store all kinds of metrics:
runner: !AllocationRunner
config: !AllocationRunnerConfig
metrics_storage: &kafka_storage_instance !KafkaStorage
topic: all_metrics
broker_ips:
- 127.0.0.1:9092
- 127.0.0.2:9092
max_timeout_in_seconds: 5.
anomalies_storage: *kafka_storage_instance
allocations_storage: *kafka_storage_instance
This approach can help to save resources (like connections), share state or simplify configuration (no need to repeat the same arguments).
If component requires some additional dependencies and you do not want dirty system interpreter library, the best way to bundle new component is to use PEX file to package all source code including dependencies.
(requests
library from previous example was available because it is already required by WCA itself).
pex -D examples python-dateutil==2.8.0 -o hello_world.pex -v
where example/hello_world_runner_with_dateutil.py
:
from wca.runners import Runner
from dateutil.utils import today
class HelloWorldRunner(Runner):
def run(self):
print('Hello world! Today is %s' % today())
then it is possible to combine two PEX files into single environment, by using
PEX_PATH
environment variable:
PEX_PATH=hello_world.pex ./dist/wca.pex -c $PWD/configs/extending/hello_world.yaml -r hello_world_runner_with_dateutil:HelloWorldRunner
outputs:
Hello world! Today is 2019-06-14 00:00:00
Note this method works great if there is no conflicting sub dependencies (Diamond dependency problem), because only one version will be available during runtime. In such case, you need to consolidate WCA and your component into single project (with common requirments) so that conflicts will be resolved during requirements gathering phase. You can check Platform Resource Manager prm component as an example of such approach.
Any children object that is used by any runner, can be replaced with extrnal component, but WCA was designed to be extended, by providing following components:
Node
class used by allRunners
to perform task discovery,Storage
classes used to enable persistance for internal metrics (*_storage
properties),Detector
class to provide anomaly detection logic,Allocator
class to provide anomaly detection and anomaly mittigation logic (by resource allocation),