Huge amount of upload errors on Grafana Cloud: resource_exhausted push rate limit #63

f0o · 2023-10-29T08:03:45Z

Every few seconds pyroscope client using the reference config in README errors with:

upload profile: failed to upload. server responded with statusCode: '422' and body: '{"code":"unknown","message":"pushing IngestInput-pprof failed resource_exhausted: push rate limit (0 B) exceeded while adding 70 KiB"}'

We're not talking about hundreds of apps here; it's only 19-20.

What limit am I hitting and what config should I use to prevent/mitigate it?

//Edit:

Turns out 20 apps running for 4 days == 50G of data. How can I limit the sampling/reporting rate because this is insane

The text was updated successfully, but these errors were encountered:

kolesnikovae · 2023-10-30T04:16:16Z

Thank you for reporting the issue @f0o. Indeed, Go profiles can be quite large depending on the workload.

The upload rate can be changed via the UploadRate configuration option. By default profiles are collected and uploaded every 15 seconds. If the application behaviour and load are stable (profiles do not change significantly), you could try to increase it up to, e.g, 30 seconds.

I'm wondering which profile types are enabled. Napkin math shows that each of the apps generates ~100KB of profiling data (uncompressed) every 15 seconds – this is an unexpectedly high data rate. Could you please tell us more about the workload? I'd also like to clarify how many individual processes you're profiling, and what you mean by apps – do you mean 20 instances (processes/hosts/pods) of the same service, or 20 logical services, represented by some fleet?

f0o · 2023-10-30T07:06:17Z

Hi @kolesnikovae

I'll look into the UploadRate parameter and tweak it once the retention expires those old profiles.

I'm using:

			ProfileTypes: []pyroscope.ProfileType{
				// these profile types are enabled by default:
				pyroscope.ProfileCPU,
				pyroscope.ProfileAllocObjects,
				pyroscope.ProfileAllocSpace,
				pyroscope.ProfileInuseObjects,
				pyroscope.ProfileInuseSpace,

				// these profile types are optional:
				pyroscope.ProfileGoroutines,
				pyroscope.ProfileMutexCount,
				pyroscope.ProfileMutexDuration,
				pyroscope.ProfileBlockCount,
				pyroscope.ProfileBlockDuration,
			},

With:

		runtime.SetMutexProfileFraction(5)
		runtime.SetBlockProfileRate(5)

And for clarification it's 3 services amounting to 19-20 pods, each very small in resource consumption (we're talking 0.05 cpu and 32-64mb memory). The workload is best described with signal/data forwarding without processing. I was about to write the processing service when I noticed these errors and started disabling profiling everywhere instead.

kolesnikovae · 2023-10-30T07:27:18Z

Hi @f0o, thank you for the feedback. I'll double-check everything and report back soon. In the meantime, please consider disabling goroutine, mutex, and block profiles.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huge amount of upload errors on Grafana Cloud: resource_exhausted push rate limit #63

Huge amount of upload errors on Grafana Cloud: resource_exhausted push rate limit #63

f0o commented Oct 29, 2023 •

edited

Loading

kolesnikovae commented Oct 30, 2023 •

edited

Loading

f0o commented Oct 30, 2023

kolesnikovae commented Oct 30, 2023 •

edited

Loading

Huge amount of upload errors on Grafana Cloud: resource_exhausted push rate limit #63

Huge amount of upload errors on Grafana Cloud: resource_exhausted push rate limit #63

Comments

f0o commented Oct 29, 2023 • edited Loading

kolesnikovae commented Oct 30, 2023 • edited Loading

f0o commented Oct 30, 2023

kolesnikovae commented Oct 30, 2023 • edited Loading

f0o commented Oct 29, 2023 •

edited

Loading

kolesnikovae commented Oct 30, 2023 •

edited

Loading

kolesnikovae commented Oct 30, 2023 •

edited

Loading