Skip to content

Commit

Permalink
docs: add cluster monitoring
Browse files Browse the repository at this point in the history
  • Loading branch information
zyy17 committed Dec 23, 2024
1 parent 918389f commit 1dc6fd9
Show file tree
Hide file tree
Showing 3 changed files with 442 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
---
keywords: [Kubernetes deployment, cluster, monitoring]
description: Guide to deploying monitoring for GreptimeDB clusters on Kubernetes, including steps for self-monitoring and Prometheus monitoring.
---

# Cluster Monitoring Deployment

After deploying a GreptimeDB cluster using GreptimeDB Operator, by default its components (Meta / Datanode / Frontend) expose a `/metrics` endpoint on their HTTP port (default `4000`) for Prometheus metrics.

We provide two approaches to monitor the GreptimeDB cluster:

1. **Enable GreptimeDB Self Monitoring**: GreptimeDB Operator will launch an additional GreptimeDB Standalone instance to collect metrics and logs from the GreptimeDB cluster.
2. **Use Prometheus Operator to Configure Prometheus Metrics Monitoring**: Users need to first deploy Prometheus Operator and create Prometheus instance, then use Prometheus Operator's `PodMonitor` to write GreptimeDB cluster metrics into Prometheus.

Users can choose the appropriate monitoring approach based on their needs.

## Enable GreptimeDB Self Monitoring

In self-monitoring mode, GreptimeDB Operator will launch an additional GreptimeDB Standalone instance to collect metrics and logs from the GreptimeDB cluster, including cluster logs and slow query logs. To collect log data, GreptimeDB Operator will start a [Vector](https://vector.dev/) sidecar container in each Pod. When this mode is enabled, JSON format logging will be automatically enabled for the cluster.

If you deploy the GreptimeDB cluster using Helm Chart (refer to [Getting Started](../getting-started.md)), you can configure the `values.yaml` file as follows:

```yaml
monitoring:
enabled: true
```
This will deploy a GreptimeDB Standalone instance named `${cluster}-monitoring` to collect metrics and logs. You can check it with:

```
kubectl get greptimedbstandalones.greptime.io ${cluster}-monitoring -n ${namespace}
```
By default, this GreptimeDB Standalone instance will store monitoring data using the Kubernetes default StorageClass in local storage. You can adjust this based on your needs.
The GreptimeDB Standalone instance can be configured via the `monitoring.standalone` field in `values.yaml`, for example:
```yaml
monitoring:
enabled: true
standalone:
base:
main:
# Configure GreptimeDB Standalone instance image
image: "greptime/greptimedb:latest"
# Configure GreptimeDB Standalone instance resources
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "2"
memory: "4Gi"
# Configure object storage for GreptimeDB Standalone instance
objectStorage:
s3:
# Configure bucket
bucket: "monitoring"
# Configure region
region: "ap-southeast-1"
# Configure secret name
secretName: "s3-credentials"
# Configure root path
root: "standalone-with-s3-data"
```

The GreptimeDB Standalone instance will expose services using `${cluster}-monitoring-standalone` as the Kubernetes Service name. You can use the following addresses to read monitoring data:

- **Prometheus metrics**: `http://${cluster}-monitor-standalone.${namespace}.svc.cluster.local:4000/v1/prometheus`
- **SQL logs**: `${cluster}-monitor-standalone.${namespace}.svc.cluster.local:4002`. By default, cluster logs are stored in `public._gt_logs` table and slow query logs are stored in `public._gt_slow_queries` table.

The Vector sidecar configuration for log collection can be customized via the `monitoring.vector` field:

```yaml
monitoring:
enabled: true
vector:
# Configure Vector image registry
registry: docker.io
# Configure Vector image repository
repository: timberio/vector
# Configure Vector image tag
tag: nightly-alpine

# Configure Vector resources
resources:
requests:
cpu: "50m"
memory: "64Mi"
limits:
cpu: "50m"
memory: "64Mi"
```
:::note
If you're not using Helm Chart, you can manually configure self-monitoring mode in the `GreptimeDBCluster` YAML:

```yaml
apiVersion: greptime.io/v1alpha1
kind: GreptimeDBCluster
metadata:
name: basic
spec:
base:
main:
image: greptime/greptimedb:latest
frontend:
replicas: 1
meta:
replicas: 1
etcdEndpoints:
- "etcd.etcd-cluster.svc.cluster.local:2379"
datanode:
replicas: 1
monitoring:
enabled: true
```

The `monitoring` field configures self-monitoring mode. See [`GreptimeDBCluster` API docs](https://github.com/GreptimeTeam/greptimedb-operator/blob/main/docs/api-references/docs.md#monitoringspec) for details.
:::

## Use Prometheus Operator to Configure Prometheus Metrics Monitoring

Users need to first deploy Prometheus Operator and create Prometheus instance. For example, you can use [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) to deploy the Prometheus stack. You can refer to its [official documentation](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) for more details.

After deploying Prometheus Operator and instances, you can configure Prometheus monitoring via the `prometheusMonitor` field in `values.yaml`:

```yaml
prometheusMonitor:
# Enable Prometheus monitoring - this will create PodMonitor resources
enabled: true
# Configure scrape interval
interval: "30s"
# Configure labels
labels:
release: prometheus
```

:::note
The `labels` field must match the `matchLabels` field used to create the Prometheus instance, otherwise metrics collection won't work properly.
:::

After configuring `prometheusMonitor`, GreptimeDB Operator will automatically create `PodMonitor` resources and import metrics into Prometheus. You can check the `PodMonitor` resources with:

```
kubectl get podmonitors.monitoring.coreos.com -n ${namespace}
```

:::note
If not using Helm Chart, you can manually configure Prometheus monitoring in the `GreptimeDBCluster` YAML:

```yaml
apiVersion: greptime.io/v1alpha1
kind: GreptimeDBCluster
metadata:
name: basic
spec:
base:
main:
image: greptime/greptimedb:latest
frontend:
replicas: 1
meta:
replicas: 1
etcdEndpoints:
- "etcd.etcd-cluster.svc.cluster.local:2379"
datanode:
replicas: 1
prometheusMonitor:
enabled: true
interval: "30s"
labels:
release: prometheus
```

The `prometheusMonitor` field configures Prometheus monitoring.
:::

## Import Grafana Dashboards

GreptimeDB cluster currently provides 3 Grafana dashboards:

- [Cluster Metrics Dashboard](https://github.com/GreptimeTeam/greptimedb/blob/main/grafana/greptimedb-cluster.json)
- [Cluster Logs Dashboard](https://github.com/GreptimeTeam/helm-charts/blob/main/charts/greptimedb-cluster/dashboards/greptimedb-cluster-logs.json)
- [Slow Query Logs Dashboard](https://github.com/GreptimeTeam/helm-charts/blob/main/charts/greptimedb-cluster/dashboards/greptimedb-cluster-slow-queries.json)

**Note**: The Cluster Logs Dashboard and Slow Query Logs Dashboard are only for self-monitoring mode, while the Cluster Metrics Dashboard works for both self-monitoring and Prometheus monitoring modes.

If using Helm Chart, you can enable `grafana.enabled` to deploy Grafana and import dashboards automatically (see [Getting Started](../getting-started.md)):

```yaml
grafana:
enabled: true
```

If you already have Grafana deployed, follow these steps to import the dashboards:

1. **Add Data Sources**

You can refer to Grafana's [datasources](https://grafana.com/docs/grafana/latest/datasources/) docs to add the following 3 data sources:

- **`metrics` data source**

For importing Prometheus metrics, works with both monitoring modes. For self-monitoring mode, use `http://${cluster}-monitor-standalone.${namespace}.svc.cluster.local:4000/v1/prometheus` as the URL. For your own Prometheus instance, use your Prometheus instance URL.

- **`information-schema` data source**

For importing cluster metadata via SQL, works with both monitoring modes. Use `${cluster}-frontend.${namespace}.svc.cluster.local:4002` as the SQL address with database `information_schema`.

- **`logs` data source**

For importing cluster and slow query logs via SQL, **only works with self-monitoring mode**. Use `${cluster}-monitor-standalone.${namespace}.svc.cluster.local:4002` as the SQL address with database `public`.

2. **Import Dashboards**

You can refer to Grafana's [Import dashboards](https://grafana.com/docs/grafana/latest/dashboards/build-dashboards/import-dashboards/) docs.
Loading

0 comments on commit 1dc6fd9

Please sign in to comment.