-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/prometheusremotewrite] Fix data race in batch series state if called concurrently #36524
[exporter/prometheusremotewrite] Fix data race in batch series state if called concurrently #36524
Conversation
Signed-off-by: Arthur Silva Sens <[email protected]>
I chose Int64 was choose at random, I'm not sure which size to pick 😅 |
910ad54
to
54b9015
Compare
54b9015
to
c79b58c
Compare
Signed-off-by: Arthur Silva Sens <[email protected]>
c79b58c
to
e41c6a2
Compare
Are you up for benchmarking that? I think we have benchmarks. |
@@ -229,7 +228,7 @@ func (prwe *prwExporter) handleExport(ctx context.Context, tsMap map[string]*pro | |||
} | |||
|
|||
// Calls the helper function to convert and batch the TsMap to the desired format | |||
requests, err := batchTimeSeries(tsMap, prwe.maxBatchSizeBytes, m, &prwe.batchTimeSeriesState) | |||
requests, err := batchTimeSeries(tsMap, prwe.maxBatchSizeBytes, m, prwe.batchTimeSeriesState) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually fix the problem? If the single *batchTimeSeriesState
value is used by multiple goroutines calling prwe.PushMetrics()
concurrently it seems that moving to atomic integer access will certainly avoid data races that would be detected by the runtime but wouldn't necessarily make the changes to that state valid. What happens if there are multiple batches processed concurrently that have significantly different sizes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there are multiple batches processed concurrently that have significantly different sizes?
Good point; they would still share the same state, and their results would be conflicting. I think I need to go back to the drawing board and think a bit more about how to solve this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm rereading the code, and my understanding is that concurrent requests with very distinct batch sizes would constantly fight for the size of the subsequent request.
Can we even do something useful with the batchStateSize if we allow multiple workers? It sounds like this optimization only works for a single worker scenario 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm rereading the code, and my understanding is that concurrent requests with very distinct batch sizes would constantly fight for the size of the subsequent request.
That's how I understand it, as well. I also think it likely that each of the three sizes tracked by this state would be decorrelated, though I'm not sure that's any more problematic.
Can we even do something useful with the batchStateSize if we allow multiple workers? It sounds like this optimization only works for a single worker scenario 🤔
I'm not sure this optimization is safe with multiple workers. Would it make more sense to use a sync.Pool
of backing stores that can eventually grow to the needed size and get periodically reaped to avoid one-off large batches causing leaks? Similar to what is done in #35184?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the sync.Pool sounds like worth exploring! We could also remove the state altogether and see how bad the benchmarks will look.
I'm trying things out and running benchmarks, I'll open new PRs once I have something to show :)
Signed-off-by: Arthur Silva Sens <[email protected]>
I did some benchmarks comparing Atomic vs Mutex, but probably not relevant anymore
Yep, so I'm comparing two branches in this benchmark. The first column is this branch, and the second column uses mutexes. arthursens$ benchstat base=atomic.txt new=mutex.txt
goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter
cpu: Apple M2 Pro
│ base │ new │
│ sec/op │ sec/op vs base │
Execute/numSample=100-2 47.47µ ± 3% 47.00µ ± 6% ~ (p=0.093 n=6)
Execute/numSample=1000-2 169.3µ ± 15% 163.7µ ± 8% ~ (p=0.310 n=6)
Execute/numSample=10000-2 1.326m ± 10% 1.408m ± 5% ~ (p=0.240 n=6)
geomean 220.1µ 221.3µ +0.54%
│ base │ new │
│ B/op │ B/op vs base │
Execute/numSample=100-2 6.081Ki ± 0% 6.081Ki ± 0% ~ (p=1.000 n=6)
Execute/numSample=1000-2 22.12Ki ± 0% 22.12Ki ± 0% ~ (p=0.675 n=6)
Execute/numSample=10000-2 38.52Ki ± 0% 38.51Ki ± 0% ~ (p=0.121 n=6)
geomean 17.30Ki 17.30Ki -0.01%
│ base │ new │
│ allocs/op │ allocs/op vs base │
Execute/numSample=100-2 79.00 ± 0% 79.00 ± 0% ~ (p=1.000 n=6) ¹
Execute/numSample=1000-2 83.00 ± 0% 83.00 ± 0% ~ (p=1.000 n=6) ¹
Execute/numSample=10000-2 83.00 ± 0% 83.00 ± 0% ~ (p=1.000 n=6) ¹
geomean 81.64 81.64 +0.00%
¹ all samples are equal The main difference between mutex and atomic operations is that the mutex blocks the whole batching operation for a few nanoseconds, so we never try to write the same memory address simultaneously. Atomic uses CAS, ensuring we only overwrite a memory address if another thread has not overwritten the value. The results don't show much difference 🤷 , so it's just a matter of preference about how the code looks. If we compare Atomic with main, there's not a remarkable difference as well. arthursens$ benchstat base=main.txt new=atomic.txt
goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter
cpu: Apple M2 Pro
│ base │ new │
│ sec/op │ sec/op vs base │
Execute/numSample=100-2 47.28µ ± 1% 47.47µ ± 3% ~ (p=0.485 n=6)
Execute/numSample=1000-2 168.3µ ± 6% 169.3µ ± 15% ~ (p=0.937 n=6)
Execute/numSample=10000-2 1.447m ± 10% 1.326m ± 10% ~ (p=0.240 n=6)
geomean 225.8µ 220.1µ -2.52%
│ base │ new │
│ B/op │ B/op vs base │
Execute/numSample=100-2 6.081Ki ± 0% 6.081Ki ± 0% ~ (p=1.000 n=6)
Execute/numSample=1000-2 22.12Ki ± 0% 22.12Ki ± 0% ~ (p=0.513 n=6)
Execute/numSample=10000-2 38.50Ki ± 0% 38.52Ki ± 0% ~ (p=0.394 n=6)
geomean 17.30Ki 17.30Ki +0.02%
│ base │ new │
│ allocs/op │ allocs/op vs base │
Execute/numSample=100-2 79.00 ± 0% 79.00 ± 0% ~ (p=1.000 n=6) ¹
Execute/numSample=1000-2 83.00 ± 0% 83.00 ± 0% ~ (p=1.000 n=6) ¹
Execute/numSample=10000-2 83.00 ± 0% 83.00 ± 0% ~ (p=1.000 n=6) ¹
geomean 81.64 81.64 +0.00%
¹ all samples are equal |
Ok, I've created two alternative PRs to this one:
I couldn't see any relevant differences in the benchmarks, so I'd love to understand if I did anything wrong |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Superseded by #36601 |
… batch state (#36601) #### Description This is an alternative for #36524 and #36600 This PR does a couple of things: * Add a test written by @edma2 that shows a data race to the batch state when running multiple consumers. * Add a benchmark for PushMetrics, with options to run with a stable number of metrics or varying metrics. * Fix the data race by introducing a `sync.Pool` of batch states. #### Benchmark results results comparing `main`, #36600 and this PR: ```console arthursens$ benchstat main.txt withoutState.txt syncpool.txt goos: darwin goarch: arm64 pkg: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter cpu: Apple M2 Pro │ main.txt │ withoutState.txt │ syncpool.txt │ │ sec/op │ sec/op vs base │ sec/op vs base │ PushMetricsVaryingMetrics-2 8.066m ± 5% 13.821m ± 9% +71.36% (p=0.002 n=6) 8.316m ± 6% ~ (p=0.065 n=6) │ main.txt │ withoutState.txt │ syncpool.txt │ │ B/op │ B/op vs base │ B/op vs base │ PushMetricsVaryingMetrics-2 5.216Mi ± 0% 34.436Mi ± 0% +560.17% (p=0.002 n=6) 5.548Mi ± 0% +6.36% (p=0.002 n=6) │ main.txt │ withoutState.txt │ syncpool.txt │ │ allocs/op │ allocs/op vs base │ allocs/op vs base │ PushMetricsVaryingMetrics-2 56.02k ± 0% 56.05k ± 0% ~ (p=0.721 n=6) 56.04k ± 0% ~ (p=0.665 n=6) ``` --------- Signed-off-by: Arthur Silva Sens <[email protected]>
Description
Fixes a data race when batching time series during export. The problem was that multiple go routines could call
PushMetrics
simultaneously. Each time it was, we would re-use the state address. This address could be written without any locks or atomic operations[1][2][3].Link to tracking issue
The problem was discovered during the review period at #35184
Testing
The first commit adds a test that calls
PushMetrics
by several go routines. The second commit fixes the test.