Skip to content

Commit

Permalink
Add List of Metrics for Windows + Design (amazon-contributing#166)
Browse files Browse the repository at this point in the history
* Added readme for awscontainerinsights for Windows

* Removed Linux metrics from Windows metrics

* Added back documentation removed by mistake
  • Loading branch information
KlwntSingh authored Feb 21, 2024
1 parent f9a1ae3 commit 64c80bc
Show file tree
Hide file tree
Showing 4 changed files with 266 additions and 1 deletion.
3 changes: 3 additions & 0 deletions receiver/awscontainerinsightreceiver/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -851,6 +851,9 @@ kubectl apply -f config.yaml

The attribute `container_status_reason` is present only when `container_status` is in "Waiting" or "Terminated" State. The attribute `container_last_termination_reason` is present only when `container_status` is in "Terminated" State.

## Available Metrics and Resource Attributes on Windows
Refer [Metrics on Windows](./internal/k8swindows/README.md)

This is a sample configuration for AWS Container Insights using the `awscontainerinsightreceiver` and `awsemfexporter` for an ECS cluster to collect the instance level metrics:
```
receivers:
Expand Down
11 changes: 10 additions & 1 deletion receiver/awscontainerinsightreceiver/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

## Container Insights Architecture for EKS
![architecture](images/eks-design.png)
![architecture for Windows Nodes](images/eks-windows-design.png)

## Container Insights Architecture for ECS
![architecture](images/ecs-design.png)
Expand All @@ -16,7 +17,15 @@
* Some pod/container related labels like podName, podId, namespace, containerName are extracted from the container spec provided by `cadvisor`. This labels will be added as resource attributes for the metrics and the AWS Container Insights processor needs those attributes to do further processing of the metrics.
* `k8sapiserver`
* Collects cluster-level metrics from k8s api server
* The receiver is designed to run as daemonset. This guarantees that only one receiver is running per cluster node. To make sure cluster-level metrics are not duplicated, the receiver integrate with K8s client which support leader election API. It leverages k8s configmap resource as some sort of LOCK primitive. The deployment will create a dedicate configmap as the lock resource. If one receiver is required to elect a leader, it will try to lock (via Create/Update) the configmap. The API will ensure one of the receivers hold the lock to be the leader. The leader continually “heartbeats” to claim its leaderships, and the other candidates periodically make new attempts to become the leader. This ensures that a new leader will be elected quickly, if the current leader fails for some reason.
* The receiver is designed to run as daemonset. This guarantees that only one receiver is running per cluster node. To make sure cluster-level metrics are not duplicated, the receiver integrate with K8s client which support leader election API. It leverages k8s configmap resource as some sort of LOCK primitive. The deployment will create a dedicate configmap as the lock resource. If one receiver is required to elect a leader, it will try to lock (via Create/Update) the configmap. The API will ensure one of the receivers hold the lock to be the leader. The leader continually “heartbeats” to claim its leaderships, and the other candidates periodically make new attempts to become the leader. This ensures that a new leader will be elected quickly, if the current leader fails for some reason.

For Windows Worker Nodes,
`awscontainerinsightreceiver` collects data from 2 main sources:
* `kubelet` Summary API
* Kubelet on Windows node expose summary API which returns CPU, Memory, Network and storage metrics for container, pod and Node.
* The receiver generates Container Insights specific metrics from the raw metrics provided by `kubelet`. The metrics are categorized as different infrastructure layers like node, node filesystem, node network, pod, pod network, container, and container filesystem.
* HCS Shim API
* HCS Shim API provides Network metrics for containers.

The following two packages are used to decorate metrics:

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
253 changes: 253 additions & 0 deletions receiver/awscontainerinsightreceiver/internal/k8swindows/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
## Available Metrics and Resource Attributes for Windows worker Nodes

### Node
| Metric | Unit |
|-------------------------------------------|--------------|
| node_cpu_limit | Millicore |
| node_cpu_request | Millicore |
| node_cpu_reserved_capacity | Percent |
| node_cpu_usage_total | Millicore |
| node_cpu_utilization | Percent |
| node_memory_limit | Bytes |
| node_memory_pgfault | Count/Second |
| node_memory_pgmajfault | Count/Second |
| node_memory_request | Bytes |
| node_memory_reserved_capacity | Percent |
| node_memory_rss | Bytes |
| node_memory_usage | Bytes |
| node_memory_utilization | Percent |
| node_memory_working_set | Bytes |
| node_network_rx_bytes | Bytes/Second |
| node_network_rx_dropped | Count/Second |
| node_network_rx_errors | Count/Second |
| node_network_total_bytes | Bytes/Second |
| node_network_tx_bytes | Bytes/Second |
| node_network_tx_dropped | Count/Second |
| node_network_tx_errors | Count/Second |
| node_number_of_running_containers | Count |
| node_number_of_running_pods | Count |
| node_status_condition_ready | Count |
| node_status_condition_pid_pressure | Count |
| node_status_condition_memory_pressure | Count |
| node_status_condition_disk_pressure | Count |
| node_status_condition_network_unavailable | Count |
| node_status_condition_unknown | Count |
| node_status_capacity_pods | Count |
| node_status_allocatable_pods | Count |

<br/><br/>
| Resource Attribute |
|-----------------------|
| ClusterName |
| InstanceType |
| NodeName |
| Timestamp |
| Type |
| Version |
| Sources |
| kubernetes |
| OperatingSystem |

<br/><br/>
<br/><br/>

### Node Filesystem
| Metric | Unit |
|------------------------------|---------|
| node_filesystem_available | Bytes |
| node_filesystem_capacity | Bytes |
| node_filesystem_usage | Bytes |
| node_filesystem_utilization | Percent |

<br/><br/>
| Resource Attribute |
|---------------------- |
| AutoScalingGroupName |
| ClusterName |
| InstanceId |
| InstanceType |
| NodeName |
| Timestamp |
| Type |
| Version |
| Sources |
| kubernete |
| OperatingSystem |
<br/><br/>
<br/><br/>

### Node Network
| Metric | Unit |
|------------------------------------|--------------|
| node_interface_network_rx_bytes | Bytes/Second |
| node_interface_network_rx_dropped | Count/Second |
| node_interface_network_rx_errors | Count/Second |
| node_interface_network_total_bytes | Bytes/Second |
| node_interface_network_tx_bytes | Bytes/Second |
| node_interface_network_tx_dropped | Count/Second |
| node_interface_network_tx_errors | Count/Second |

<br/><br/>
| Resource Attribute |
|-----------------------|
| AutoScalingGroupName |
| ClusterName |
| InstanceId |
| InstanceType |
| NodeName |
| Timestamp |
| Type |
| Version |
| interface |
| Sources |
| kubernete |
| OperatingSystem |
<br/><br/>
<br/><br/>

### Pod
| Metric | Unit |
|-------------------------------------------------------------------|--------------|
| pod_cpu_limit | Millicore |
| pod_cpu_request | Millicore |
| pod_cpu_reserved_capacity | Percent |
| pod_cpu_usage_total | Millicore |
| pod_cpu_utilization | Percent |
| pod_cpu_utilization_over_pod_limit | Percent |
| pod_memory_limit | Bytes |
| pod_memory_max_usage | Bytes |
| pod_memory_pgfault | Count/Second |
| pod_memory_pgmajfault | Count/Second |
| pod_memory_request | Bytes |
| pod_memory_reserved_capacity | Percent |
| pod_memory_rss | Bytes |
| pod_memory_usage | Bytes |
| pod_memory_utilization | Percent |
| pod_memory_utilization_over_pod_limit | Percent |
| pod_memory_working_set | Bytes |
| pod_network_rx_bytes | Bytes/Second |
| pod_network_rx_dropped | Count/Second |
| pod_network_rx_errors | Count/Second |
| pod_network_total_bytes | Bytes/Second |
| pod_network_tx_bytes | Bytes/Second |
| pod_network_tx_dropped | Count/Second |
| pod_network_tx_errors | Count/Second |
| pod_number_of_container_restarts | Count |
| pod_number_of_containers | Count |
| pod_number_of_running_containers | Count |
| pod_status_ready | Count |
| pod_status_scheduled | Count |
| pod_status_unknown | Count |
| pod_status_failed | Count |
| pod_status_pending | Count |
| pod_status_running | Count |
| pod_status_succeeded | Count |
| pod_container_status_running | Count |
| pod_container_status_terminated | Count |
| pod_container_status_waiting | Count |
| pod_container_status_waiting_reason_crash_loop_back_off | Count |
| pod_container_status_waiting_reason_image_pull_error | Count |
| pod_container_status_waiting_reason_start_error | Count |
| pod_container_status_waiting_reason_create_container_error | Count |
| pod_container_status_waiting_reason_create_container_config_error | Count |
| pod_container_status_terminated_reason_oom_killed | Count |

| Resource Attribute |
|------------------------|
| AutoScalingGroupName |
| ClusterName |
| InstanceId |
| InstanceType |
| K8sPodName |
| Namespace |
| NodeName |
| PodId |
| Timestamp |
| Type |
| Version |
| Sources |
| kubernete |
| pod_status |
| OperatingSystem |

<br/><br/>

### Pod Network
| Metric | Unit |
|-----------------------------------|--------------|
| pod_interface_network_rx_bytes | Bytes/Second |
| pod_interface_network_rx_dropped | Count/Second |
| pod_interface_network_rx_errors | Count/Second |
| pod_interface_network_total_bytes | Bytes/Second |
| pod_interface_network_tx_bytes | Bytes/Second |
| pod_interface_network_tx_dropped | Count/Second |
| pod_interface_network_tx_errors | Count/Second |


<br/><br/>
| Resource Attribute |
|----------------------|
| AutoScalingGroupName |
| ClusterName |
| InstanceId |
| InstanceType |
| K8sPodName |
| Namespace |
| NodeName |
| PodId |
| Timestamp |
| Type |
| Version |
| interface |
| Sources |
| kubernete |
| pod_status |
| OperatingSystem |

<br/><br/>
<br/><br/>


### Container
| Metric | Unit |
|---------------------------------------------------|--------------|
| container_cpu_limit | Millicore |
| container_cpu_request | Millicore |
| container_cpu_usage_total | Millicore |
| container_cpu_utilization | Percent |
| container_cpu_utilization_over_container_limit | Percent |
| container_memory_limit | Bytes |
| container_memory_mapped_file | Bytes |
| container_memory_pgfault | Count/Second |
| container_memory_pgmajfault | Count/Second |
| container_memory_request | Bytes |
| container_memory_rss | Bytes |
| container_memory_usage | Bytes |
| container_memory_utilization | Percent |
| container_memory_utilization_over_container_limit | Percent |
| container_memory_working_set | Bytes |
| number_of_container_restarts | Count |

<br/><br/>

| Resource Attribute |
|-----------------------------------|
| AutoScalingGroupName |
| ClusterName |
| ContainerId |
| ContainerName |
| InstanceId |
| InstanceType |
| K8sPodName |
| Namespace |
| NodeName |
| PodId |
| Timestamp |
| Type |
| Version |
| Sources |
| kubernetes |
| container_status |
| container_status_reason |
| container_last_termination_reason |
| OperatingSystem |

0 comments on commit 64c80bc

Please sign in to comment.