-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[processor/transform] Add support for expressing a statement's context via path names #36888
base: main
Are you sure you want to change the base?
[processor/transform] Add support for expressing a statement's context via path names #36888
Conversation
…ext via path names
@TylerHelmuth @evan-bradley, this's still a draft and won't compile while we don't get #36869, #36820 merged.
Edit: marked as "ready for review", but we still need to discuss the documentation plan. |
One goal of this change is to avoid requiring users to have to even think about contexts: ideally users should only need to think about the data they want to modify (e.g. "the attributes on my spans") and we reserve talking about contexts only to power users or component authors. We should update our docs accordingly to remove the burden on users to understand OTTL-specific concepts unless understanding those concepts is a requirement.
I'm okay doing that in a follow-up PR given how massive this change is and since it's backwards compatible, but we should update the docs to refer to the new style as well.
Yeah, I think we should basically just switch our docs over to the new style. We should probably call out that we've made this change recently to make it clear things have changed, and link to a section that preserves the old doc style for users who want to wait to transition. |
- set(span.name, "bear") where span.attributes["http.path"] == "/animal" | ||
- context: span | ||
statements: | ||
- set(attributes["name"], "bear") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- set(attributes["name"], "bear") | |
- set(span.attributes["name"], "bear") |
What if I use contexts in the paths here? We should probably have a test that confirms the behavior in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing, @evan-bradley!
For this specific case, it would still work, as the statements rewrite utility wouldn't prepend the context name to it. span
is a valid path context name for the ottlspan
parser and it would work as expected. The same applies to other higher context accesses, which are already present on the statements, for example: metric.name.
This behaviour should be guaranteed by the prependContextToStatementPaths
tests and (indirectly) by all the transformprocessor/internal/**/processor_test.go
files. Every Test_Process*_Inferred*Context
test, under the hood, is validating this same condition.
I'd be fine adding extra tests if you think it's necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the detailed answer! What I'm mostly concerned about here is a test that confirms that this works from the user's perspective, even if we've tested this extensively internally. It's minor, but I wouldn't mind one or two of these statements including the context in the path just so we know it validates correctly.
@evan-bradley @TylerHelmuth, I've just found out an issue in my implementation and it will require some extra work. This approach works fine for ~99% of the cases, but there's an edge-case that I didn't take into consideration:
I'm testing an alternative solution that
|
I've been trying to find an "easy" solution for the The idea is having a cache per context type & execution, so statements belonging to the same parser could share/access the same cached data (per context). To make that possible, we would need to change the func NewTransformContextWithCache(logRecord plog.LogRecord, instrumentationScope pcommon.InstrumentationScope, resource pcommon.Resource, scopeLogs plog.ScopeLogs, resourceLogs plog.ResourceLogs, cache pcommon.Map) TransformContext {
return TransformContext{
logRecord: logRecord,
instrumentationScope: instrumentationScope,
resource: resource,
cache: cache,
scopeLogs: scopeLogs,
resourceLogs: resourceLogs,
}
} Finally, the transform processor would need to be changed to group, and to propagate the context's cache value. It could be done using a func (p *Processor) ProcessLogs(ctx context.Context, ld plog.Logs) (plog.Logs, error) {
if p.flatMode {
pdatautil.FlattenLogs(ld.ResourceLogs())
defer pdatautil.GroupByResourceLogs(ld.ResourceLogs())
}
contextCache := make(map[string]*pcommon.Map, len(p.contexts))
for _, c := range p.contexts {
cacheKey := reflect.TypeOf(c).String()
cache, ok := contextCache[cacheKey]
if !ok {
m := pcommon.NewMap()
cache = &m
contextCache[cacheKey] = cache
}
err := c.ConsumeLogs(common.WithCache(ctx, cache), ld)
if err != nil {
p.logger.Error("failed processing logs", zap.Error(err))
return ld, err
}
}
return ld, nil
} func (l logStatements) ConsumeLogs(ctx context.Context, ld plog.Logs) error {
for i := 0; i < ld.ResourceLogs().Len(); i++ {
rlogs := ld.ResourceLogs().At(i)
for j := 0; j < rlogs.ScopeLogs().Len(); j++ {
slogs := rlogs.ScopeLogs().At(j)
logs := slogs.LogRecords()
for k := 0; k < logs.Len(); k++ {
tCtx := ottllog.NewTransformContextWithCache(logs.At(k), slogs.Scope(), rlogs.Resource(), slogs, rlogs, newCacheFrom(ctx))
...
}
}
}
return nil
} I've created a new branch with this approach code, you can see the diff here: transformprocessor-add-statements-with-context-support...transformprocessor-add-statements-with-context-support-cache-per-exec. Although the described solution seems to be working, I'm still unsure if that's the proper/best way of solving it. |
Really appreciate all the information. I think your solution that uses If possible, it would be nice to make this solution generalizable to storage solutions that aren't the OTTL cache since future contexts may want to work with other types of storage, e.g. things like spans for tail sampling traces or metadata from the k8s API for enrichment. I think using |
For this initial implementation I prefer not changing OTTL functionality, which means we should not allow sharing cache between statements that use the same OTTL context but are not executed in the same For example, the existing behavior for this config: transform:
trace_statements:
- context: span
statements:
- set(name, "bear")
- set(name, "tiger")
- context: resource
statements:
- set(attributes["name"], "bear")
- context: span
statements:
- set(name, "lion") Is that each In the new syntax the statements would look like transform:
trace_statements:
- set(span.name, "bear")
- set(span.name, "tiger")
- set(resource.attributes["name"], "bear")
- set(span.name, "lion") I believe the transformprocessor needs to group the first 2 statements (because they can be executed in the same ottl context), then recognize that the 3rd statement is a new ottl context and make a new grouping, and then recognize that the 4th statement is a new context and make a new grouping. These groupings would then behave like the old configuration's Maybe a different solution is to make the cache shared between all statements listed? In this idea the cache is shared between all items in the I agree with @evan-bradley that we don't need to stick with using |
Thanks @evan-bradley , that sounds good! Although the Regarding sharing or not the cache between different context ( transform:
trace_statements:
- context: span
statements:
- set(cache["animal"], "bear")
- set(name, cache["animal"])
- context: span
statements:
- set(name, cache["animal"]) # I don't expect it to have a value, as there's no previous statement setting it But on the other hand, the flat structure give me the impression that everything is part of the same group, and as a user, I'd expect the caching access to work, independently of the order I configure my statements. The context prefix prepended to the transform:
trace_statements:
- set(span.cache["animal"], "bear")
- set(resource.attributes["name"], "bear")
- set(span.name, span.cache["animal"]) # it should work
That'd be possible to implement, and relatively simpler. But relying on the statements order to use the cache, IMO, would require users to think about contexts, and the order logic is kind of hidden.
The draft implementation is sharing the cache for structured and flat statements, which indeed is a breaking change. As an alternative, we could share the cache only for the new configuration style, per context & execution. log_statements:
- set(log.cache["key"], "foo") # log context
- set(resource.cache["key"], "bar") # resource context
- set(resource.attributes["resource_key"], resource.cache["key"]) # "bar"
- set(log.attributes["log_key"], log.cache["key"]) # "foo" Mixing the caches among statements would impact and be limited by the context inference, for example: - set(resource.attributes["log_key"], log.cache["key"]) # Ok as the inferred context would be "log"
- set(log.attributes["log_key"], resource.cache["key"]) # Error as the inferred context "log" does not have a "resource.cache" path Thoughts? |
@edmocosta I like your idea to keep our original configuration behavior untouched and only apply this updated cache logic for the new configuration style. That will help us transition users. I agree that since the cache is now prefixed with the OTTL context that users will assume that this will work: transform:
trace_statements:
- set(span.cache["animal"], "bear")
- set(resource.attributes["name"], "bear")
- set(span.name, span.cache["animal"]) # it should work Lets move forward with leaving the original configuration behavior untouched and the new configuration behavior share a cache between all statements in the same OTTL context. |
I think we should consider allowing users to group statements for things like resetting the cache or setting the error mode. I could see us ending up with something like this:
Realistically for both of these features you could just use separate transform processor instances (or clear the cache manually with As a side note, I think this example may be unintuitive to some users:
As a user who is unaware of contexts and is just trying to address different parts of a payload, it wouldn't be clear to me why the |
@evan-bradley that second example is a really good point, that will be confusing, but on the bright side it'll be a startup parsing error. Maybe we can make the parser smart enough to recognize when something is trying to reference a cache that doesn't exist and return a really helpful error message? Ultimately we are running up against a failure of OTTL to try to not be a language that defines variables/scope while at the same time allowing users to persist data between statements. I could go either way about setting error mode per statement. I think a different transformprocessor is a sufficient solution, but allowing grouping statements by error mode will probably mean less config. I know I don't really like looking for multiple transform processors in my own config. |
The upside here is that I think the scope is pretty explicit: a user's "variables" are namespaced to a particular section of OTLP. Would we be violating anything if we made it possible to reach up into another context's cache? It's possible we could just implement this later since it wouldn't be breaking for these statements to not work now and work later, but I think I'd like to see us allow this usage if there are no serious downsides. |
Hi @evan-bradley! Yes, I think having the capability of setting the error mode and other future options would be nice! For the error mode, we could keep it as it's, on the top of the transform configuration (backward compatible), and add overrides per transform:
error_mode: ignore
trace_statements:
- set(span.cache["animal"], "bear")
- set(span.name, span.cache["animal"])
- error_mode: propagate
statements:
- merge_maps(span.cache, ParseJSON(span.attributes["tx_data"]))
- set(span.attributes["no_animal"], true) where span.cache["animal"] == nil The same applies to the global Examples: Structured (current): error_mode: ignore
log_statements:
- context: log
statements:
- set(attributes["key"], "foo"))
- context: resource
statements:
- set(attributes["key"], "foo")) Flat: error_mode: ignore
log_statements:
- set(log.attributes["key"], "foo"))
- set(resource.attributes["key"], "foo")) Structured (contextless): error_mode: ignore
log_statements:
- error_mode: propagate
conditions: "...." # other configurations
statements:
- set(log.attributes["key"], "foo"))
- set(resource.attributes["key"], "foo")) Finally, it would also support mixed configuration styles, like the one from the first example.
Yes, we probably could make that work, but that would require some refactoring on the
Thoughts? |
Great point that conditions can be used in a group of statements as well, I think that further validates that it would be useful. I'm okay mixing the flat and nested styles like this so long as everything is executed in order. As for the cache, I agree that it's at least a parsing error, but I think the error is highly misleading:
Reviewing the ottlresource context link shows that cache is in fact a path, and as a user I'd be left pretty bewildered as to why it's not accepted. If we plan to make these paths work, then maybe we just have to deal with this for now. |
We definitely don't want to lose the conditional block feature, we need to keep that functionality around. |
…tatements, conditions, etc)
Hi @evan-bradley & @TylerHelmuth! I've addressed all PR suggestions and everything else we've discussed on this PR, which includes:
Please see the processor/transformprocessor/testdata/config.yaml and unit tests for more configuration/usage examples. Thanks to all nice changes/suggestions, this PR became huge!!! So I'm wondering if you folks would prefer me to split it into multiple PRs (although we would still have a big one for the Thanks! |
@edmocosta lets break up this PR now that we've got a direction. Lets start with a PR for the OTTL changes and a PR for the transformprocessor changes and if we need to break it down further from there we can. |
…lector-contrib into transformprocessor-add-statements-with-context-support # Conflicts: # processor/transformprocessor/internal/common/metrics.go
OTTL changes: #37166 |
…set context's cache value (#37166) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description - Improved error message for higher context `cache` access. Now when users try to configure statements like `set(log.attributes["log_key"], resource.cache["log_key"])`, they will see an error message similar to: ```diff + access to cache must be be performed using the same statement's context, please replace "resource.cache[log_key]" with "log.cache[log_key]"` - invalid argument at position 1: segment "cache" from path "resource.cache[log_key]" is not a valid path nor a valid OTTL keyword for the Resource context - review https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/ottl/contexts/ottlresource to see all valid paths ``` - Added the `WithCache` option to all OTTL transformation contexts. This change allows API consumer to set the initial cache value or share it among different contexts/executions. <!-- Issue number (e.g. #1234) or full URL to issue, if applicable. --> #### Link to tracking issue Split of #36888 <!--Describe what testing was performed and which tests were added.--> #### Testing Unit tests <!--Please delete paragraphs that you did not use before submitting.-->
…lector-contrib into transformprocessor-add-statements-with-context-support
@edmocosta somehow this is still massive. Can we split it up further? Maybe by making some changes is |
|
Hey folks, I've opened a new PR (#37272) removing the following changes:
I'm not sure If I can split it up further, it's basically replacing the parser collection, and introducing the minimum changes required for implementing the removed items. The PR is not so small, but I think it got a bit better compared to this one. |
Description
This PR is part of #29017, and changes the
transformprocessor
to support configuring contexts via statement path names. 🎉For that, it changes the processor to use the new OTTL/Parser utilities and to support flat configuration styles, where the
context
value is configured via path names, for example, the following configuration:can now also be written as:
Mixed mode is also supported:
Implementation details
context
value on the statements paths, making it backward compatible and not requiring users to change their existing configurations.Needs discussion
What's the plan for the processor documentation?
Should we change the existing doc's examples?
If the plan is to deprecate the structured configuration style, should we make it focused on the new style?
Link to tracking issue
Is blocked by #36869, #36820, #37271
Related to #29017,
Testing
Unit tests
Documentation
TBD