Skip to content

Latest commit

 

History

History
245 lines (185 loc) · 12.7 KB

README.md

File metadata and controls

245 lines (185 loc) · 12.7 KB

Overview of the Observability for Kubernetes Operator

The Observability for Kubernetes Operator deploys the necessary agents to monitor your clusters and workloads in Kubernetes. This Operator is based on kubebuilder SDK.

Important: Logs (Beta) is enabled only for selected customers. If you’d like to participate, contact your Observability account representative.

Quick Reference

Why Use the Observability for Kubernetes Operator?

The Operator simplifies operational aspects of managing the Kubernetes Integration for VMware Aria Operations for Applications (formerly known as Tanzu Observability by Wavefront). Here are some examples, with more to come!

  • Enhanced status reporting of the Kubernetes Integration so that users can ensure their cluster and Kubernetes resources are reporting data.
  • Kubernetes Operator features provide a declarative mechanism for deploying the necessary agents in a Kubernetes environment.
  • Centralized configuration.
  • Enhanced configuration validation to surface what needs to be corrected in order to deploy successfully.
  • Efficient Kubernetes resource usage supports scaling out the cluster (leader) node and worker nodes independently.

Note: The Kubernetes Metrics Collector that is deployed by this Operator still supports configuration via configmap. For example, Istio and MySQL metrics, Telegraf configuration, etc. are still supported. For details on the Collector, see collector.md.

Architecture

Observability for Kubernetes Operator Architecture

Installation

Note: The Observability for Kubernetes Operator Helm chart is deprecated and no longer supported. Use the deploy, upgrade, and removal instructions below instead.

Prerequisites

To install the integration, you must use the kubectl tool.

Deploy the Monitoring Agents with the Observability for Kubernetes Operator

  1. Install the Observability for Kubernetes Operator into the observability-system namespace.

    Note: If you already have the deprecated Kubernetes Integration installed by using Helm or manual deployment, uninstall it before you install the Operator.

    kubectl apply -f https://raw.githubusercontent.com/wavefrontHQ/observability-for-kubernetes/main/deploy/wavefront-operator.yaml
    
  2. Create a Kubernetes secret with your Wavefront API token. See Managing API Tokens page.

    kubectl create -n observability-system secret generic wavefront-secret --from-literal token=YOUR_WAVEFRONT_TOKEN
    
  3. Create a wavefront.yaml file with your Wavefront Custom Resource configuration. The simplest configuration is:

    # Need to change YOUR_CLUSTER_NAME and YOUR_WAVEFRONT_URL
    apiVersion: wavefront.com/v1alpha1
    kind: Wavefront
    metadata:
      name: wavefront
      namespace: observability-system
    spec:
      clusterName: YOUR_CLUSTER_NAME
      wavefrontUrl: YOUR_WAVEFRONT_URL
      dataCollection:
        metrics:
          enable: true
      dataExport:
        wavefrontProxy:
          enable: true

    See the Configuration section below for details.

  4. (Logging Beta) Optionally add the configuration for logging to the wavefront.yaml file. For example:

    # Need to change YOUR_CLUSTER_NAME, YOUR_WAVEFRONT_URL accordingly
    apiVersion: wavefront.com/v1alpha1
    kind: Wavefront
    metadata:
      name: wavefront
      namespace: observability-system
    spec:
      clusterName: YOUR_CLUSTER_NAME
      wavefrontUrl: YOUR_WAVEFRONT_URL
      dataCollection:
        metrics:
          enable: true
        logging:
          enable: true
      dataExport:
        wavefrontProxy:
          enable: true

    See Logs Overview (Beta) for an overview and some links to more doc about the logging beta.

    See Bring Your Own Logs Shipper for an overview of how to use the Operator with your own logs shipper.

  5. Deploy the agents with your configuration

    kubectl apply -f <path_to_your_wavefront.yaml>
    
  6. Run the following command to get status of the Kubernetes integration:

    kubectl get wavefront -n observability-system
    

    The command should return a table like the following, displaying Operator instance health:

    NAME        STATUS    PROXY           CLUSTER-COLLECTOR   NODE-COLLECTOR   LOGGING        AGE    MESSAGE
    wavefront   Healthy   Running (1/1)   Running (1/1)       Running (3/3)    Running (3/3)  2m4s   All components are healthy
    

    If STATUS is Unhealthy, check troubleshooting.

Note: For details on migrating from existing helm chart or manual deploy, see Migration.

Configuration

You configure the Observability for Kubernetes Operator with a custom resource file.

When you update the resource file, the Operator picks up the changes and updates the integration deployment accordingly.

To update the custom resource file:

  • Open the custom resource file for edit.
  • Change one or more options and save the file.
  • Run kubectl apply -f <path_to_your_config_file.yaml>.

See below for configuration options.

We have templates for common scenarios. See the comments in each file for usage instructions.

You can see all configuration options in the wavefront-full-config.yaml.

Creating Alerts

We have alerts on common Kubernetes issues. For details on creating alerts, see alerts.md.

Observability Failures

Alert name Description
Observability Status is Unhealthy The status of the Observability for Kubernetes is unhealthy.

Pod Failures

Alert name Description
Pod Stuck in Pending Workload has pod stuck in pending.
Pod Stuck in Terminating Workload has pod stuck in terminating.
Pod Backoff Event Workload has pod with container status ImagePullBackOff or CrashLoopBackOff.
Workload Not Ready Workload has pods that are not ready.
Pod Out-of-memory Kills Workload has pod with container status OOMKilled.
Container CPU Throttling Workload has a container with high CPU throttling.
Container CPU Overutilization Workload has a container with high CPU utilization.
Container Memory Overutilization Workload has a container with high memory utilization.
Missing etcd leader etcd cannot elect a leader.

Persistent Volume Failures

Alert name Description
Persistent Volumes No Claim Persistent Volume has no claim.
Persistent Volumes Error Persistent Volume has issues with provisioning.
Persistent Volume Claim Overutilization Workload has low available disk space for a claimed Persistent Volume.

Node Failures

Alert name Description
Node Memory Overutilization Node has high memory utilization.
Node CPU Overutilization Node has high CPU utilization.
Node Filesystem Overutilization Node storage is almost full.
Node CPU-request Saturation Node has overcommitted cpu resource requests.
Node Memory-request Saturation Node has overcommitted memory resource requests.
Node Disk Pressure Node has problematic DiskPressure condition.
Node Memory Pressure Node has problematic MemoryPressure condition.
Node Condition Not Ready Node Condition not in Ready state.

Bring Your Own Logs Shipper

The operator deploys a data export component (wavefront-proxy) which can receive log data and forward it to the Operations for Applications service. You will need to configure your logs shipper to send logs to the data export component (wavefront-proxy) deployed by the Operator.

Here is a Wavefront Custom Resource example config for this scenario.

To make the best use of your logging solution on Kubernetes, we recommend having the below Kubernetes log attributes:

Log attribute key Description
cluster The kubernetes cluster name
pod_name The pod name
container_name The container name
namespace_name The namespace name
pod_id The pod id
container_id The container id

In addition to these, here are some general log attributes to configure your logs shipper based on your use case.

Upgrade

Upgrade the Observability for Kubernetes Operator and underlying agents to a new version by running the following command :

kubectl apply -f https://raw.githubusercontent.com/wavefrontHQ/observability-for-kubernetes/main/deploy/wavefront-operator.yaml

Note: This command will not upgrade any existing deprecated Helm or manual installations. See migration.md for migration instructions.

Downgrade

Go to Releases, and find the previous release version number, for example v2.0.3. Use this value to replace PREVIOUS_VERSION in the following command:


kubectl apply -f https://github.com/wavefrontHQ/observability-for-kubernetes/releases/download/PREVIOUS_VERSION/wavefront-operator.yaml

Removal

To remove the Observability for Kubernetes Operator from your environment, run the following command:

kubectl delete -f https://raw.githubusercontent.com/wavefrontHQ/observability-for-kubernetes/main/deploy/wavefront-operator.yaml

Contribution

See the Contribution page