Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce distributor memory usage when error volume is high #6095

Merged
merged 2 commits into from
Jul 18, 2024

Conversation

damnever
Copy link
Contributor

@damnever damnever commented Jul 17, 2024

What this PR does:

Avoid nesting httpgrpcutil.WrapHTTPGrpcError.

image

This cluster has ~600 QPS, and ~400 of them are 2xx.
image

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@damnever damnever changed the title Reduce distributor memory usage during periods of high error volume Reduce distributor memory usage when error volume is high Jul 17, 2024
@yeya24 yeya24 requested a review from danielblando July 17, 2024 17:31
@harry671003
Copy link
Contributor

If possible, could you add a heap profile after this fix for comparison?

@danielblando
Copy link
Contributor

danielblando commented Jul 17, 2024

Nice optimization.

One question on your graph.
After the "upgrade with fix", I do see an increase in CPU, was that a rollback or scale down?

@damnever
Copy link
Contributor Author

@harry671003 before the fix, our workload was OOM so it was easy to capture, today, our error rate is decreasing, the httpgrpcutil.WrapHTTPGrpcError is nearly invisible, so there is nothing interesting.

@danielblando yes, the distributor scaled down after the rollback, and the upgrade with the fix has the same replica count as before.

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@danielblando danielblando enabled auto-merge July 18, 2024 22:37
@danielblando danielblando disabled auto-merge July 18, 2024 22:40
@danielblando danielblando merged commit a3fedc8 into cortexproject:master Jul 18, 2024
15 of 16 checks passed
@damnever damnever deleted the perf/higherr branch July 19, 2024 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants