-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/loadbalancing] feat(lb): Introduce the ability to load balance on composite keys in lb #36567
base: main
Are you sure you want to change the base?
Conversation
Right now, there's a problem at high throughput using the load balancer and the `service.name` resource attribute: The load balancers themself get slow. While it's possible to vertically scale them to a point (e.g. about 100k req/sec), as they get slow they star tot back up traffic and block on requests. Applications then can't write as many spans out, and start dropping spans. This commit seeks to address that by extending the load balancing collector to allow create a composite from attributes that can still keep the load balancing decision "consistent enough" to reduce cardinallity, but still spread the load across ${N} collectors. It doesn't make too many assumptions about how the operators will use this, except that the underlying data (the spans) are unlikely to be complete in all cases, and the key generation is "best effort". This is a deviation from the existing design, in which hard-requires "span.name". == Design Notes === Contributor Skill As a contributor, I'm very much new to the opentelemetry collector, and do not anticipate I will be contributing much except for as needs require to tune the collectors that I am responsible for. Given this, the code may violate certain assumptions that are otherwise "well known". === Required Knowledge The biggest surprise in this code was that despite accepting a slice, the routingIdentifierFromTraces function assumes spans have been processed with the batchpersignal.SplitTraces() function, which appears to ensure taht each "trace" contains only a single span (thus allowing them to be multiplexed effectively) This allows the function to be simplified quite substantially. === Use case The primary use case I am thinking about when writing this work is calculating metrics in the spanmetricsconnector component. Essentially, services drive far too much traffic for a single collector instance to handle, so we need to multiplex it in a way that still allows them to be calculated in a single place (limiting cardinality) but also, spreads the load across ${N} collectors. === Traces only implementation This commit addreses this only for traces, as I only care about traces. The logic can likely be extended easily, however.
Signed-off-by: Juraci Paixão Kröhling <[email protected]>
spans := ils.At(0).Spans() | ||
|
||
if a == pseudoAttrSpanKind { | ||
rKey.WriteString(spans.At(0).Kind().String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only first span is considered - is that on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs on the function say it's on purpose.
// 2. https://stackoverflow.com/a/70388629 | ||
var rKey strings.Builder | ||
|
||
for _, a := range attrs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the order of attributes important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, they need to be stable, but as they're supplied by the configuration (and there's no transformation in there) they are stable. i could explicitly sort them, but it feels a bit unnecessary here.
// only svcRouting and attrRouting are supported. For attrRouting, any attribute, as well the "pseudo" attributes span.name | ||
// and span.kind are supported. | ||
// | ||
// In practice, makes the assumption that ptrace.Traces includes only one trace of each kind, in the "trace tree". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be in the README
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which part do you mean? i edited the readme to describe the routing attributes:
In addition to resource / span attributes,
span.kind
,span.name
(the top level properties of a span) are also supported.
(How do we move this forward?) |
I think @atoulme was AFK for a few days. If he's OK, I'd prefer to merge this as is, and work on the follow-up items on new PRs. |
Right now, there's a problem at high throughput using the load balancer
and the
service.name
resource attribute: The load balancers themselfget slow. While it's possible to vertically scale them to a point (e.g.
about 100k req/sec), as they get slow they star tot back up traffic and
block on requests. Applications then can't write as many spans out, and
start dropping spans.
This commit seeks to address that by extending the load balancing
collector to allow create a composite from attributes that can still
keep the load balancing decision "consistent enough" to reduce
cardinallity, but still spread the load across ${N} collectors.
It doesn't make too many assumptions about how the operators will use
this, except that the underlying data (the spans) are unlikely to be
complete in all cases, and the key generation is "best effort". This is
a deviation from the existing design, in which hard-requires
"span.name".
== Design Notes
=== Contributor Skill
As a contributor, I'm very much new to the opentelemetry collector, and
do not anticipate I will be contributing much except for as needs
require to tune the collectors that I am responsible for. Given this,
the code may violate certain assumptions that are otherwise "well
known".
=== Required Knowledge
The biggest surprise in this code was that despite accepting a slice,
the routingIdentifierFromTraces function assumes spans have been
processed with the batchpersignal.SplitTraces() function, which appears
to ensure taht each "trace" contains only a single span (thus allowing
them to be multiplexed effectively)
This allows the function to be simplified quite substantially.
=== Use case
The primary use case I am thinking about when writing this work is
calculating metrics in the spanmetricsconnector component. Essentially,
services drive far too much traffic for a single collector instance to
handle, so we need to multiplex it in a way that still allows them to be
calculated in a single place (limiting cardinality) but also, spreads
the load across ${N} collectors.
=== Traces only implementation
This commit addreses this only for traces, as I only care about traces.
The logic can likely be extended easily, however.
Fixes #35320
Fixes #33660