Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receiver crash report: concurrency bug #204

Closed
jmacd opened this issue May 28, 2024 · 0 comments · Fixed by #205
Closed

Receiver crash report: concurrency bug #204

jmacd opened this issue May 28, 2024 · 0 comments · Fixed by #205

Comments

@jmacd
Copy link
Contributor

jmacd commented May 28, 2024

The v0.23.0 receiver includes a potential crash due to a concurrency violation since #181.

The Arrow consumer needs to be used sequentially from a single goroutine. I believe this was the intention, and the code needs to be refactored to improve readability, since this bug was not uncovered during review.

jmacd added a commit that referenced this issue May 31, 2024
…ments & restructuring (#205)

Restructure receiver code to improve readability. There are a number of
metrics that are incremented when a batch starts being processed and are
decremented when the batch is finished, but the control flow that
maintained the balance of these updates was convoluted.

The root-cause of #204 is that Arrow batches meant for a consumer to be
processed in order were processed out-of-order. There was a large
function body which served two purposes: consume Arrow data of the
appropriate kind, enter data for the pipeline to consume next. This had
to be split into two parts and should have been done as part of #181.
(I, as reviewer, missed this and find, in hindsight, that the code is
not easy to follow.)

This improves the code structure by moving all stateful aspects of
starting/finishing a request into a new `inFlightData` object which has
a deferrable method to finish the request. Here, we keep:

1. The `inFlightWG` done count
2. The active requests metric
3. The active items metric
4. The active bytes metric
5. The bytes-acquired from the semaphore
6. A per-request span covering Arrow decode
7. Netstat-related instrumentation

Authorization now happens before acquiring from the semaphore. 

A number of `fmt.Errorf()` calls are replaced with `status.Errorf(...)`
and a specific error code. The tests are updated to be more specific.
Several Arrow tests were accidentally canceling the test before an
expected error condition was actually tested, they have been audited and
improved.

One new concurrent-receiver test was added.

Fixes #204.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant