diff --git a/receiver/awscontainerinsightreceiver/README.md b/receiver/awscontainerinsightreceiver/README.md index b7d26825d01f..fd063ee7cc15 100644 --- a/receiver/awscontainerinsightreceiver/README.md +++ b/receiver/awscontainerinsightreceiver/README.md @@ -851,6 +851,9 @@ kubectl apply -f config.yaml The attribute `container_status_reason` is present only when `container_status` is in "Waiting" or "Terminated" State. The attribute `container_last_termination_reason` is present only when `container_status` is in "Terminated" State. +## Available Metrics and Resource Attributes on Windows +Refer [Metrics on Windows](./internal/k8swindows/README.md) + This is a sample configuration for AWS Container Insights using the `awscontainerinsightreceiver` and `awsemfexporter` for an ECS cluster to collect the instance level metrics: ``` receivers: diff --git a/receiver/awscontainerinsightreceiver/design.md b/receiver/awscontainerinsightreceiver/design.md index 1669740c36ba..a333fd1b4a46 100644 --- a/receiver/awscontainerinsightreceiver/design.md +++ b/receiver/awscontainerinsightreceiver/design.md @@ -3,6 +3,7 @@ ## Container Insights Architecture for EKS ![architecture](images/eks-design.png) +![architecture for Windows Nodes](images/eks-windows-design.png) ## Container Insights Architecture for ECS ![architecture](images/ecs-design.png) @@ -16,7 +17,15 @@ * Some pod/container related labels like podName, podId, namespace, containerName are extracted from the container spec provided by `cadvisor`. This labels will be added as resource attributes for the metrics and the AWS Container Insights processor needs those attributes to do further processing of the metrics. * `k8sapiserver` * Collects cluster-level metrics from k8s api server - * The receiver is designed to run as daemonset. This guarantees that only one receiver is running per cluster node. To make sure cluster-level metrics are not duplicated, the receiver integrate with K8s client which support leader election API. It leverages k8s configmap resource as some sort of LOCK primitive. The deployment will create a dedicate configmap as the lock resource. If one receiver is required to elect a leader, it will try to lock (via Create/Update) the configmap. The API will ensure one of the receivers hold the lock to be the leader. The leader continually “heartbeats” to claim its leaderships, and the other candidates periodically make new attempts to become the leader. This ensures that a new leader will be elected quickly, if the current leader fails for some reason. + * The receiver is designed to run as daemonset. This guarantees that only one receiver is running per cluster node. To make sure cluster-level metrics are not duplicated, the receiver integrate with K8s client which support leader election API. It leverages k8s configmap resource as some sort of LOCK primitive. The deployment will create a dedicate configmap as the lock resource. If one receiver is required to elect a leader, it will try to lock (via Create/Update) the configmap. The API will ensure one of the receivers hold the lock to be the leader. The leader continually “heartbeats” to claim its leaderships, and the other candidates periodically make new attempts to become the leader. This ensures that a new leader will be elected quickly, if the current leader fails for some reason. + +For Windows Worker Nodes, +`awscontainerinsightreceiver` collects data from 2 main sources: +* `kubelet` Summary API + * Kubelet on Windows node expose summary API which returns CPU, Memory, Network and storage metrics for container, pod and Node. + * The receiver generates Container Insights specific metrics from the raw metrics provided by `kubelet`. The metrics are categorized as different infrastructure layers like node, node filesystem, node network, pod, pod network, container, and container filesystem. +* HCS Shim API + * HCS Shim API provides Network metrics for containers. The following two packages are used to decorate metrics: diff --git a/receiver/awscontainerinsightreceiver/images/eks-windows-design.png b/receiver/awscontainerinsightreceiver/images/eks-windows-design.png new file mode 100644 index 000000000000..950ed3636072 Binary files /dev/null and b/receiver/awscontainerinsightreceiver/images/eks-windows-design.png differ diff --git a/receiver/awscontainerinsightreceiver/internal/k8swindows/README.md b/receiver/awscontainerinsightreceiver/internal/k8swindows/README.md new file mode 100644 index 000000000000..9d6169c96952 --- /dev/null +++ b/receiver/awscontainerinsightreceiver/internal/k8swindows/README.md @@ -0,0 +1,253 @@ +## Available Metrics and Resource Attributes for Windows worker Nodes + +### Node +| Metric | Unit | +|-------------------------------------------|--------------| +| node_cpu_limit | Millicore | +| node_cpu_request | Millicore | +| node_cpu_reserved_capacity | Percent | +| node_cpu_usage_total | Millicore | +| node_cpu_utilization | Percent | +| node_memory_limit | Bytes | +| node_memory_pgfault | Count/Second | +| node_memory_pgmajfault | Count/Second | +| node_memory_request | Bytes | +| node_memory_reserved_capacity | Percent | +| node_memory_rss | Bytes | +| node_memory_usage | Bytes | +| node_memory_utilization | Percent | +| node_memory_working_set | Bytes | +| node_network_rx_bytes | Bytes/Second | +| node_network_rx_dropped | Count/Second | +| node_network_rx_errors | Count/Second | +| node_network_total_bytes | Bytes/Second | +| node_network_tx_bytes | Bytes/Second | +| node_network_tx_dropped | Count/Second | +| node_network_tx_errors | Count/Second | +| node_number_of_running_containers | Count | +| node_number_of_running_pods | Count | +| node_status_condition_ready | Count | +| node_status_condition_pid_pressure | Count | +| node_status_condition_memory_pressure | Count | +| node_status_condition_disk_pressure | Count | +| node_status_condition_network_unavailable | Count | +| node_status_condition_unknown | Count | +| node_status_capacity_pods | Count | +| node_status_allocatable_pods | Count | + +

+| Resource Attribute | +|-----------------------| +| ClusterName | +| InstanceType | +| NodeName | +| Timestamp | +| Type | +| Version | +| Sources | +| kubernetes | +| OperatingSystem | + +

+

+ +### Node Filesystem +| Metric | Unit | +|------------------------------|---------| +| node_filesystem_available | Bytes | +| node_filesystem_capacity | Bytes | +| node_filesystem_usage | Bytes | +| node_filesystem_utilization | Percent | + +

+| Resource Attribute | +|---------------------- | +| AutoScalingGroupName | +| ClusterName | +| InstanceId | +| InstanceType | +| NodeName | +| Timestamp | +| Type | +| Version | +| Sources | +| kubernete | +| OperatingSystem | +

+

+ +### Node Network +| Metric | Unit | +|------------------------------------|--------------| +| node_interface_network_rx_bytes | Bytes/Second | +| node_interface_network_rx_dropped | Count/Second | +| node_interface_network_rx_errors | Count/Second | +| node_interface_network_total_bytes | Bytes/Second | +| node_interface_network_tx_bytes | Bytes/Second | +| node_interface_network_tx_dropped | Count/Second | +| node_interface_network_tx_errors | Count/Second | + +

+| Resource Attribute | +|-----------------------| +| AutoScalingGroupName | +| ClusterName | +| InstanceId | +| InstanceType | +| NodeName | +| Timestamp | +| Type | +| Version | +| interface | +| Sources | +| kubernete | +| OperatingSystem | +

+

+ +### Pod +| Metric | Unit | +|-------------------------------------------------------------------|--------------| +| pod_cpu_limit | Millicore | +| pod_cpu_request | Millicore | +| pod_cpu_reserved_capacity | Percent | +| pod_cpu_usage_total | Millicore | +| pod_cpu_utilization | Percent | +| pod_cpu_utilization_over_pod_limit | Percent | +| pod_memory_limit | Bytes | +| pod_memory_max_usage | Bytes | +| pod_memory_pgfault | Count/Second | +| pod_memory_pgmajfault | Count/Second | +| pod_memory_request | Bytes | +| pod_memory_reserved_capacity | Percent | +| pod_memory_rss | Bytes | +| pod_memory_usage | Bytes | +| pod_memory_utilization | Percent | +| pod_memory_utilization_over_pod_limit | Percent | +| pod_memory_working_set | Bytes | +| pod_network_rx_bytes | Bytes/Second | +| pod_network_rx_dropped | Count/Second | +| pod_network_rx_errors | Count/Second | +| pod_network_total_bytes | Bytes/Second | +| pod_network_tx_bytes | Bytes/Second | +| pod_network_tx_dropped | Count/Second | +| pod_network_tx_errors | Count/Second | +| pod_number_of_container_restarts | Count | +| pod_number_of_containers | Count | +| pod_number_of_running_containers | Count | +| pod_status_ready | Count | +| pod_status_scheduled | Count | +| pod_status_unknown | Count | +| pod_status_failed | Count | +| pod_status_pending | Count | +| pod_status_running | Count | +| pod_status_succeeded | Count | +| pod_container_status_running | Count | +| pod_container_status_terminated | Count | +| pod_container_status_waiting | Count | +| pod_container_status_waiting_reason_crash_loop_back_off | Count | +| pod_container_status_waiting_reason_image_pull_error | Count | +| pod_container_status_waiting_reason_start_error | Count | +| pod_container_status_waiting_reason_create_container_error | Count | +| pod_container_status_waiting_reason_create_container_config_error | Count | +| pod_container_status_terminated_reason_oom_killed | Count | + +| Resource Attribute | +|------------------------| +| AutoScalingGroupName | +| ClusterName | +| InstanceId | +| InstanceType | +| K8sPodName | +| Namespace | +| NodeName | +| PodId | +| Timestamp | +| Type | +| Version | +| Sources | +| kubernete | +| pod_status | +| OperatingSystem | + +

+ +### Pod Network +| Metric | Unit | +|-----------------------------------|--------------| +| pod_interface_network_rx_bytes | Bytes/Second | +| pod_interface_network_rx_dropped | Count/Second | +| pod_interface_network_rx_errors | Count/Second | +| pod_interface_network_total_bytes | Bytes/Second | +| pod_interface_network_tx_bytes | Bytes/Second | +| pod_interface_network_tx_dropped | Count/Second | +| pod_interface_network_tx_errors | Count/Second | + + +

+| Resource Attribute | +|----------------------| +| AutoScalingGroupName | +| ClusterName | +| InstanceId | +| InstanceType | +| K8sPodName | +| Namespace | +| NodeName | +| PodId | +| Timestamp | +| Type | +| Version | +| interface | +| Sources | +| kubernete | +| pod_status | +| OperatingSystem | + +

+

+ + +### Container +| Metric | Unit | +|---------------------------------------------------|--------------| +| container_cpu_limit | Millicore | +| container_cpu_request | Millicore | +| container_cpu_usage_total | Millicore | +| container_cpu_utilization | Percent | +| container_cpu_utilization_over_container_limit | Percent | +| container_memory_limit | Bytes | +| container_memory_mapped_file | Bytes | +| container_memory_pgfault | Count/Second | +| container_memory_pgmajfault | Count/Second | +| container_memory_request | Bytes | +| container_memory_rss | Bytes | +| container_memory_usage | Bytes | +| container_memory_utilization | Percent | +| container_memory_utilization_over_container_limit | Percent | +| container_memory_working_set | Bytes | +| number_of_container_restarts | Count | + +

+ +| Resource Attribute | +|-----------------------------------| +| AutoScalingGroupName | +| ClusterName | +| ContainerId | +| ContainerName | +| InstanceId | +| InstanceType | +| K8sPodName | +| Namespace | +| NodeName | +| PodId | +| Timestamp | +| Type | +| Version | +| Sources | +| kubernetes | +| container_status | +| container_status_reason | +| container_last_termination_reason | +| OperatingSystem |