Skip to content

Commit

Permalink
Add tuning documentation
Browse files Browse the repository at this point in the history
This commit is a followup of the PR #9149 where we discussed the idea of
centralizing the known tuning associated with their use cases.

Signed-off-by: Wilfried Roset <[email protected]>
  • Loading branch information
wilfriedroset committed Nov 24, 2024
1 parent 96c92c9 commit 5a1294d
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 0 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,8 @@

### Documentation

* [FEATURE] Add tuning documentation. #9978

### Tools

* [FEATURE] `splitblocks`: add new tool to split blocks larger than a specified duration into multiple blocks. #9517, #9779
Expand Down
47 changes: 47 additions & 0 deletions docs/sources/mimir/configure/tuning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
description: Tune Grafana Mimir according to your use cases.
menuTitle: Tuning
title: Tune Grafana Mimir according to your use cases
weight: 110
---

# Tune Grafana Mimir according to your use cases

For most use cases, you can use the default settings that come with Mimir.
However, sometimes you need to tune Mimir to reach optimal performance. Use the following guidance when tuning settings in Mimir.

## Heavy multi-tenancy

For each tenant, Mimir opens and maintains a TSDB in memory. If you have a significant number of tenants, the memory overhead might become prohibitive.
To reduce the associated overhead, consider the following:

- Reduce `-blocks-storage.tsdb.head-chunks-write-buffer-size-bytes`, default `4MB`. For example, try `1MB` or `128KB`.
- Reduce `-blocks-storage.tsdb.stripe-size`, default `16384`. For example, try `256`, or even `64`.
- Configure [shuffle sharding](https://grafana.com/docs/mimir/latest/configure/configure-shuffle-sharding/)

## Compression

Depending on the CPU model used in the underlying infrastructure, the compression for both WALs and GRPC communication might consume a significant portion of the available CPU resources.
To identify this case, you can use profiling with tools like [Grafana Pyroscope](https://grafana.com/docs/pyroscope/latest/).

To reduce resource consumption, consider the following:

- Make sure `wal_compression_enabled` is not enabled.
- Make sure `grpc_compression` is either off, which is the default, or configured to `snappy`. `gzip` consumes more CPU than `snappy`. However, disabling `grpc_compression` implies more network traffic, and in turn, might increase the total cost of ownership (TCO) of running Mimir.

If you must use compression, for example, to fit in the network bandwidth, consider using nodes with more powerful CPU. This implies an increase in TCO.

## Cache size

Grafana Mimir relies on Memcached for its caches. Memcached relies, by default, only on memory.
Memcached [extstore](https://docs.memcached.org/features/flashstorage/) feature allows to extend Memcached’s memory space onto flash (or similar) storage.

Refer to [how we scaled Grafana Cloud Logs' Memcached cluster to 50TB and improved reliability](https://grafana.com/blog/2023/08/23/how-we-scaled-grafana-cloud-logs-memcached-cluster-to-50tb-and-improved-reliability/).

## Periodic latency spikes when cutting blocks

Depending on the workload, you might witness latency spikes when Mimir cuts blocks.
To reduce the impact of this behavior, consider the following:

- Upgrade to `2.15+`. Refer to <https://github.com/grafana/mimir/commit/03f2f06e1247e997a0246d72f5c2c1fd9bd386df>.
- Reduce `-blocks-storage.tsdb.block-ranges-period`, default `2h`. For example. try `1h`.

0 comments on commit 5a1294d

Please sign in to comment.