Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/transform] Add support for expressing a statement's context via path names #36888

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

edmocosta
Copy link
Contributor

@edmocosta edmocosta commented Dec 18, 2024

Description

This PR is part of #29017, and changes the transformprocessor to support configuring contexts via statement path names. 🎉

For that, it changes the processor to use the new OTTL/Parser utilities and to support flat configuration styles, where the context value is configured via path names, for example, the following configuration:

transform:
  trace_statements:
    - context: span
      statements:
        - set(name, "bear") where attributes["http.path"] == "/animal"
        - keep_keys(attributes, ["http.method", "http.path"])
    - context: resource
      statements:
        - set(attributes["name"], "bear")

can now also be written as:

transform:
  trace_statements:
        - set(span.name, "bear") where span.attributes["http.path"] == "/animal"
        - keep_keys(span.attributes, ["http.method", "http.path"])
        - set(resource.attributes["name"], "bear")

Mixed mode is also supported:

transform:
  trace_statements:
        - set(span.name, "bear") where span.attributes["http.path"] == "/animal"
        - keep_keys(span.attributes, ["http.method", "http.path"])
  log_statements:
    - context: log
      statements:
        - set(body, "bear") where attributes["http.path"] == "/animal"

Implementation details

  • When the flat configuration mode is used, it requires statements's paths to have a valid context prefix, and relies on the context inferrer utility to choose the right parser.
  • For the structured configuration mode (existing style), it relies on the parser collection and statements rewrite utility to prepend the configured context value on the statements paths, making it backward compatible and not requiring users to change their existing configurations.

Needs discussion

What's the plan for the processor documentation?
Should we change the existing doc's examples?
If the plan is to deprecate the structured configuration style, should we make it focused on the new style?

Link to tracking issue

Is blocked by #36869, #36820, #37271
Related to #29017,

Testing

Unit tests

Documentation

TBD

@edmocosta
Copy link
Contributor Author

edmocosta commented Dec 18, 2024

@TylerHelmuth @evan-bradley, this's still a draft and won't compile while we don't get #36869, #36820 merged.
Regarding documentation, I think we need to discuss the plans and how/when we're gonna change it, so I added a few questions on this PR's description. Could you please take a look an share your thoughts? Thanks!

Needs discussion
What's the plan for the processor documentation?
Should we change the existing doc's examples?
If the plan is to deprecate the structured configuration style, should we make it focused on the new style?


Edit: marked as "ready for review", but we still need to discuss the documentation plan.

@edmocosta edmocosta changed the title [processor/transformprocessor] Add support to statements to reflect their context via path names [processor/transform] Add support for expressing a statement's context via path names Dec 23, 2024
@edmocosta edmocosta marked this pull request as ready for review December 23, 2024 16:10
@edmocosta edmocosta requested a review from a team as a code owner December 23, 2024 16:10
@evan-bradley
Copy link
Contributor

What's the plan for the processor documentation?

One goal of this change is to avoid requiring users to have to even think about contexts: ideally users should only need to think about the data they want to modify (e.g. "the attributes on my spans") and we reserve talking about contexts only to power users or component authors. We should update our docs accordingly to remove the burden on users to understand OTTL-specific concepts unless understanding those concepts is a requirement.

Should we change the existing doc's examples?

I'm okay doing that in a follow-up PR given how massive this change is and since it's backwards compatible, but we should update the docs to refer to the new style as well.

If the plan is to deprecate the structured configuration style, should we make it focused on the new style?

Yeah, I think we should basically just switch our docs over to the new style. We should probably call out that we've made this change recently to make it clear things have changed, and link to a section that preserves the old doc style for users who want to wait to transition.

- set(span.name, "bear") where span.attributes["http.path"] == "/animal"
- context: span
statements:
- set(attributes["name"], "bear")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- set(attributes["name"], "bear")
- set(span.attributes["name"], "bear")

What if I use contexts in the paths here? We should probably have a test that confirms the behavior in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing, @evan-bradley!

For this specific case, it would still work, as the statements rewrite utility wouldn't prepend the context name to it. span is a valid path context name for the ottlspan parser and it would work as expected. The same applies to other higher context accesses, which are already present on the statements, for example: metric.name.

This behaviour should be guaranteed by the prependContextToStatementPaths tests and (indirectly) by all the transformprocessor/internal/**/processor_test.go files. Every Test_Process*_Inferred*Context test, under the hood, is validating this same condition.

I'd be fine adding extra tests if you think it's necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed answer! What I'm mostly concerned about here is a test that confirms that this works from the user's perspective, even if we've tested this extensively internally. It's minor, but I wouldn't mind one or two of these statements including the context in the path just so we know it validates correctly.

@edmocosta
Copy link
Contributor Author

edmocosta commented Dec 24, 2024

@evan-bradley @TylerHelmuth, I've just found out an issue in my implementation and it will require some extra work.
The current configuration's unmarshaller parses flat statements into separate common.ContextStatements (one per statement), as it does not know which context those statements belong to, being unable to group them at unmarshalling time.

This approach works fine for ~99% of the cases, but there's an edge-case that I didn't take into consideration: cache.
The cache field is transient, and its value is not propagated to the next statement's transform context. For example, the following statements wouldn't work as expected, as every statement has it's own transform context (and its own cache):

- set(log.cache["key"], "value")
- set(log.attributes["key"], log.cache["key"])

I'm testing an alternative solution that seems to be working:

  • Parse the transformprocessor configuration grouping all contextless statements into a single common.ContextStatements, instead of one per flat statement.
  • On the transformprocessor parser collections (e.g LogParserCollection), split and group all contextless statements by their inferred context, parsing them together and sharing the same TransformContext.

Unless I'm missing something, it should solve the cache issue, and make the parsing more efficient as well.
Edit: It solves the cache issue, but grouping the statements by context make they run out of order, which is a problem for statements like:

- set(scope.attributes["test"], "pass")
- set(log.attributes["test"], "pass") where instrumentation_scope.attributes["test"] == "pass"`
- set(scope.attributes["test"], "fail")

@edmocosta
Copy link
Contributor Author

I've been trying to find an "easy" solution for the cache issue, and there's another approach that came to my mind that might be worth exploring.

The idea is having a cache per context type & execution, so statements belonging to the same parser could share/access the same cached data (per context).

To make that possible, we would need to change the ottl/contexts adding a new function NewTransformContextWithCache that would accept an extra cache argument, giving API consumers the ability of initializing the cache value. E.g.:

func NewTransformContextWithCache(logRecord plog.LogRecord, instrumentationScope pcommon.InstrumentationScope, resource pcommon.Resource, scopeLogs plog.ScopeLogs, resourceLogs plog.ResourceLogs, cache pcommon.Map) TransformContext {
	return TransformContext{
		logRecord:            logRecord,
		instrumentationScope: instrumentationScope,
		resource:             resource,
		cache:                cache,
		scopeLogs:            scopeLogs,
		resourceLogs:         resourceLogs,
	}
}

Finally, the transform processor would need to be changed to group, and to propagate the context's cache value. It could be done using a ctx context.Context value (which IMO, isn't great, but considering we can't break the consumer interfaces, it might be okay):

func (p *Processor) ProcessLogs(ctx context.Context, ld plog.Logs) (plog.Logs, error) {
	if p.flatMode {
		pdatautil.FlattenLogs(ld.ResourceLogs())
		defer pdatautil.GroupByResourceLogs(ld.ResourceLogs())
	}

	contextCache := make(map[string]*pcommon.Map, len(p.contexts))
	for _, c := range p.contexts {
		cacheKey := reflect.TypeOf(c).String()
		cache, ok := contextCache[cacheKey]
		if !ok {
			m := pcommon.NewMap()
			cache = &m
			contextCache[cacheKey] = cache
		}
		err := c.ConsumeLogs(common.WithCache(ctx, cache), ld)
		if err != nil {
			p.logger.Error("failed processing logs", zap.Error(err))
			return ld, err
		}
	}
	return ld, nil
}
func (l logStatements) ConsumeLogs(ctx context.Context, ld plog.Logs) error {
	for i := 0; i < ld.ResourceLogs().Len(); i++ {
		rlogs := ld.ResourceLogs().At(i)
		for j := 0; j < rlogs.ScopeLogs().Len(); j++ {
			slogs := rlogs.ScopeLogs().At(j)
			logs := slogs.LogRecords()
			for k := 0; k < logs.Len(); k++ {
				tCtx := ottllog.NewTransformContextWithCache(logs.At(k), slogs.Scope(), rlogs.Resource(), slogs, rlogs, newCacheFrom(ctx))
				...
			}
		}
	}
	return nil
}

I've created a new branch with this approach code, you can see the diff here: transformprocessor-add-statements-with-context-support...transformprocessor-add-statements-with-context-support-cache-per-exec.

Although the described solution seems to be working, I'm still unsure if that's the proper/best way of solving it.
Do you have any other ideas?

@evan-bradley
Copy link
Contributor

Really appreciate all the information. I think your solution that uses context.Context to pass down the cache map is pretty good given the constraints, and it's not a big deal if we change later anyway since it's all internal. However, I dug into it a bit and I can't see any strict reason why ParseContextStatements needs to return a consumer interface considering we bundle all the *Statements structs into a slice and run them ourselves anyway. Could we instead modify the ParseContextStatements functions to return an interface we define that accepts the cache as a direct argument?

If possible, it would be nice to make this solution generalizable to storage solutions that aren't the OTTL cache since future contexts may want to work with other types of storage, e.g. things like spans for tail sampling traces or metadata from the k8s API for enrichment. I think using context.Context would provide this, but changing to a custom interface may allow us more leeway.

@TylerHelmuth
Copy link
Member

For this initial implementation I prefer not changing OTTL functionality, which means we should not allow sharing cache between statements that use the same OTTL context but are not executed in the same ContextStatements.

For example, the existing behavior for this config:

transform:
  trace_statements:
    - context: span
      statements:
        - set(name, "bear")
        - set(name, "tiger")
    - context: resource
      statements:
        - set(attributes["name"], "bear")
    - context: span
      statements:
        - set(name, "lion")

Is that each ContextStatements listed under trace_statements gets its own cache. If we can, I believe we should try to preserve this concept until we have a need to support ContextStatements sharing a context.

In the new syntax the statements would look like

transform:
  trace_statements:
    - set(span.name, "bear")
    - set(span.name, "tiger")
    - set(resource.attributes["name"], "bear")
    - set(span.name, "lion")

I believe the transformprocessor needs to group the first 2 statements (because they can be executed in the same ottl context), then recognize that the 3rd statement is a new ottl context and make a new grouping, and then recognize that the 4th statement is a new context and make a new grouping. These groupings would then behave like the old configuration's ContextStatements.

Maybe a different solution is to make the cache shared between all statements listed? In this idea the cache is shared between all items in the ptrace.Traces/plogs.Logs/pmetric.Metrics. I think that would be a breaking change so I would prefer to stick with implementing our existing behavior tho.

I agree with @evan-bradley that we don't need to stick with using consumer.Consume* interface/functions unless it benefits us. We use that today because it was an existing pattern that the transformprocessor fell into, but it is all internal stuff so we can change as needed.

@edmocosta
Copy link
Contributor Author

Could we instead modify the ParseContextStatements functions to return an interface we define that accepts the cache as a direct argument?

Thanks @evan-bradley , that sounds good! Although the context.Context key works, having it as a direct argument would make things much clear IMO.


Regarding sharing or not the cache between different context (ContextStatements), I'm still not 100% sure what kind of solution we should provide, for example, when I see a structured configuration like the one @TylerHelmuth shared, it's somehow implicit by the structure that we've different groups.

transform:
  trace_statements:
    - context: span
      statements:
        - set(cache["animal"], "bear")
        - set(name, cache["animal"])
    - context: span
      statements:
        - set(name, cache["animal"]) # I don't expect it to have a value, as there's no previous statement setting it

But on the other hand, the flat structure give me the impression that everything is part of the same group, and as a user, I'd expect the caching access to work, independently of the order I configure my statements. The context prefix prepended to the cache path also contributes to that:

transform:
  trace_statements:
    - set(span.cache["animal"], "bear")
    - set(resource.attributes["name"], "bear")
    - set(span.name, span.cache["animal"]) # it should work

I believe the transformprocessor needs to group the first 2 statements (because they can be executed in the same ottl context), then recognize that the 3rd statement is a new ottl context and make a new grouping, and then recognize that the 4th statement is a new context and make a new grouping. These groupings would then behave like the old configuration's ContextStatements.

That'd be possible to implement, and relatively simpler. But relying on the statements order to use the cache, IMO, would require users to think about contexts, and the order logic is kind of hidden.

Maybe a different solution is to make the cache shared between all statements listed? In this idea the cache is shared between all items in the ptrace.Traces/plogs.Logs/pmetric.Metrics. I think that would be a breaking change so I would prefer to stick with implementing our existing behavior tho.

The draft implementation is sharing the cache for structured and flat statements, which indeed is a breaking change. As an alternative, we could share the cache only for the new configuration style, per context & execution.
That way, the existing functionality would continue to work as it does today, and new configuration styles would share the same cache per context, for example, the following statements would work fine, and each <context>.cache would hold their own values:

log_statements:
    - set(log.cache["key"], "foo") # log context
    - set(resource.cache["key"], "bar") # resource context
    - set(resource.attributes["resource_key"], resource.cache["key"]) # "bar"
    - set(log.attributes["log_key"], log.cache["key"])  # "foo"

Mixing the caches among statements would impact and be limited by the context inference, for example:

- set(resource.attributes["log_key"], log.cache["key"]) # Ok as the inferred context would be "log"
- set(log.attributes["log_key"], resource.cache["key"]) # Error as the inferred context "log" does not have a "resource.cache" path

Thoughts?

@TylerHelmuth
Copy link
Member

@edmocosta I like your idea to keep our original configuration behavior untouched and only apply this updated cache logic for the new configuration style. That will help us transition users.

I agree that since the cache is now prefixed with the OTTL context that users will assume that this will work:

transform:
  trace_statements:
    - set(span.cache["animal"], "bear")
    - set(resource.attributes["name"], "bear")
    - set(span.name, span.cache["animal"]) # it should work

Lets move forward with leaving the original configuration behavior untouched and the new configuration behavior share a cache between all statements in the same OTTL context.

@evan-bradley
Copy link
Contributor

I think we should consider allowing users to group statements for things like resetting the cache or setting the error mode. I could see us ending up with something like this:

transform:
  trace_statements:
    - error_mode: ignore
      statements:
        - set(span.cache["animal"], "bear")
        - set(span.name, span.cache["animal"])
    # Super important statements that should fail if there's an error
    - error_mode: propagate
      statements:
        - merge_maps(span.cache, ParseJSON(span.attributes["tx_data"]))
        - set(span.attributes["no_animal"], true) where span.cache["animal"] == nil # Won't be set to "bear" as above, separate cache 

Realistically for both of these features you could just use separate transform processor instances (or clear the cache manually with delete_matching_keys), but I think it would be a nice capability to set the error mode or potentially other future options for a group of statements. What do you think?

As a side note, I think this example may be unintuitive to some users:

- set(resource.cache["log_key"], resource.attributes["key"]) # Ok as the inferred context would be "resource"
- set(log.attributes["log_key"], resource.cache["log_key"]) # Error as the inferred context "log" does not have a "resource.cache" path

As a user who is unaware of contexts and is just trying to address different parts of a payload, it wouldn't be clear to me why the resource.cache path works in one statement and not the other. Would it cause problems if this worked?

@TylerHelmuth
Copy link
Member

@evan-bradley that second example is a really good point, that will be confusing, but on the bright side it'll be a startup parsing error. Maybe we can make the parser smart enough to recognize when something is trying to reference a cache that doesn't exist and return a really helpful error message? Ultimately we are running up against a failure of OTTL to try to not be a language that defines variables/scope while at the same time allowing users to persist data between statements.

I could go either way about setting error mode per statement. I think a different transformprocessor is a sufficient solution, but allowing grouping statements by error mode will probably mean less config. I know I don't really like looking for multiple transform processors in my own config.

@evan-bradley
Copy link
Contributor

Ultimately we are running up against a failure of OTTL to try to not be a language that defines variables/scope while at the same time allowing users to persist data between statements.

The upside here is that I think the scope is pretty explicit: a user's "variables" are namespaced to a particular section of OTLP. Would we be violating anything if we made it possible to reach up into another context's cache?

It's possible we could just implement this later since it wouldn't be breaking for these statements to not work now and work later, but I think I'd like to see us allow this usage if there are no serious downsides.

@edmocosta
Copy link
Contributor Author

Hi @evan-bradley!

Yes, I think having the capability of setting the error mode and other future options would be nice!
I don't think we would have issues implementing that, since we're still using the same object to parse the transformprocessor configuration (common.ContextStatements).

For the error mode, we could keep it as it's, on the top of the transform configuration (backward compatible), and add overrides per common.ContextStatements. In that case, the top level error mode would act as a default, which could be overridden by the statement group's configuration, for example:

transform:
  error_mode: ignore
  trace_statements:
     - set(span.cache["animal"], "bear")
     - set(span.name, span.cache["animal"])
     - error_mode: propagate 
       statements:
         - merge_maps(span.cache, ParseJSON(span.attributes["tx_data"]))
         - set(span.attributes["no_animal"], true) where span.cache["animal"] == nil 

The same applies to the global conditions, which currently are not supported by the flat configuration style.
That said, I think we can officially support a third configuration style, which basically would be the structured form we currently have, but without the context key value and enforcing paths to have a context prefix.

Examples:

Structured (current):

error_mode: ignore
log_statements:
 - context: log
   statements:
        - set(attributes["key"], "foo"))
 - context: resource
   statements:
        - set(attributes["key"], "foo"))

Flat:

error_mode: ignore
log_statements:
    - set(log.attributes["key"], "foo"))
    - set(resource.attributes["key"], "foo"))

Structured (contextless):

error_mode: ignore
log_statements:
     - error_mode: propagate   
       conditions: "...." #  other configurations
       statements:
        - set(log.attributes["key"], "foo"))
        - set(resource.attributes["key"], "foo"))

Finally, it would also support mixed configuration styles, like the one from the first example.

As a user who is unaware of contexts and is just trying to address different parts of a payload, it wouldn't be clear to me why the resource.cache path works in one statement and not the other. Would it cause problems if this worked?

Yes, we probably could make that work, but that would require some refactoring on the ottl/contexts that I agree we could address later. Although UX is not optimal, as @TylerHelmuth mentioned, at least it will get a parsing error at the startup:

Error: invalid config for "transform" processor unable to parse OTTL statement "set(log.attributes[\"log_key\"], resource.cache[\"log_key\"])": error while parsing arguments for call to "set": invalid argument at position 1: segment "cache" from path "resource.cache[log_key]" is not a valid path nor a valid OTTL keyword for the Resource context - review https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/ottl/contexts/ottlresource to see all valid paths

Thoughts?

@evan-bradley
Copy link
Contributor

evan-bradley commented Jan 8, 2025

Great point that conditions can be used in a group of statements as well, I think that further validates that it would be useful. I'm okay mixing the flat and nested styles like this so long as everything is executed in order.

As for the cache, I agree that it's at least a parsing error, but I think the error is highly misleading:

segment "cache" from path "resource.cache[log_key]" is not a valid path nor a valid OTTL keyword for the Resource context - review [...] to see all valid paths

Reviewing the ottlresource context link shows that cache is in fact a path, and as a user I'd be left pretty bewildered as to why it's not accepted. If we plan to make these paths work, then maybe we just have to deal with this for now.

@TylerHelmuth
Copy link
Member

We definitely don't want to lose the conditional block feature, we need to keep that functionality around.

@edmocosta
Copy link
Contributor Author

Hi @evan-bradley & @TylerHelmuth!

I've addressed all PR suggestions and everything else we've discussed on this PR, which includes:

  • Improve error message for higher context "cache" access. Now when users try to configure statements like set(log.attributes["log_key"], resource.cache["log_key"]), they will see an error message similar to access to cache must be be performed using the same statement's context, please replace "resource.cache[log_key]" by "log.cache[log_key]".
  • It's now possible to configure/override the top level error_mode per statement's groups (common.ContextStatements).
      error_mode: ignore
      trace_statements:
        - error_mode: propagate
          statements:
            - set(resource.attributes["name"], "propagate")
        - statements:
            - set(resource.attributes["name"], "ignore")
  • Context inferred configurations styles now supports the global conditions feature. It will require path's contexts only for context inferred configurations. Existing settings should continue to work as it's.
      log_statements:
          - statements:
              - set(log.attributes["name"], "replaced")
             conditions:
              - log.attributes["name"] == "" #path without context results into a "required context" error
        
         - context: log
           statements:
              - set(attributes["name"], "replaced")
           conditions:
              - attributes["name"] == ""
  • Increase test coverage.

Please see the processor/transformprocessor/testdata/config.yaml and unit tests for more configuration/usage examples.


Thanks to all nice changes/suggestions, this PR became huge!!! So I'm wondering if you folks would prefer me to split it into multiple PRs (although we would still have a big one for the transformprocessor changes), or if you're willing to review all those changes at once.

Thanks!

@TylerHelmuth
Copy link
Member

@edmocosta lets break up this PR now that we've got a direction. Lets start with a PR for the OTTL changes and a PR for the transformprocessor changes and if we need to break it down further from there we can.

…lector-contrib into transformprocessor-add-statements-with-context-support

# Conflicts:
#	processor/transformprocessor/internal/common/metrics.go
@edmocosta
Copy link
Contributor Author

@edmocosta lets break up this PR now that we've got a direction. Lets start with a PR for the OTTL changes and a PR for the transformprocessor changes and if we need to break it down further from there we can.

OTTL changes: #37166

TylerHelmuth pushed a commit that referenced this pull request Jan 13, 2025
…set context's cache value (#37166)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
- Improved error message for higher context `cache` access. Now when
users try to configure statements like `set(log.attributes["log_key"],
resource.cache["log_key"])`, they will see an error message similar to:

  ```diff
+ access to cache must be be performed using the same statement's
context, please replace "resource.cache[log_key]" with
"log.cache[log_key]"`
- invalid argument at position 1: segment "cache" from path
"resource.cache[log_key]" is not a valid path nor a valid OTTL keyword
for the Resource context - review
https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/ottl/contexts/ottlresource
to see all valid paths
  ```
- Added the `WithCache` option to all OTTL transformation contexts. This
change allows API consumer to set the initial cache value or share it
among different contexts/executions.

<!-- Issue number (e.g. #1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Split of
#36888

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Unit tests

<!--Please delete paragraphs that you did not use before submitting.-->
@TylerHelmuth
Copy link
Member

@edmocosta somehow this is still massive. Can we split it up further? Maybe by making some changes is transformprocessor/internal and then using them in a second PR?

@edmocosta
Copy link
Contributor Author

internal/filter changes: #37271
(I'm still working on splitting the transformprocessor changes).

@edmocosta
Copy link
Contributor Author

Hey folks, I've opened a new PR (#37272) removing the following changes:

  • Support for flat configuration (unmarshaller)
  • Shared cache per context (as it does not support flat configuration yet)
  • Error mode per statement's group
  • Global conditions support (path w/ context)

I'm not sure If I can split it up further, it's basically replacing the parser collection, and introducing the minimum changes required for implementing the removed items. The PR is not so small, but I think it got a bit better compared to this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants