fix(datadog_logs sink): Apply agent-json header on events from agent #22701

graphcareful · 2025-03-20T23:42:12Z

Summary

When routing datadog_agent logs through Vector before sending to the datadog logs intake, users have complained that they lose contextual log level information.

The cause was determined to be a header DD-PROTOCOL: agent-json, the agent prepares HTTP requests to the logs backend with this header while Vector does not.

This header lets the logs intake take precedence of the attributes that are nestedwithin the 'message' attribute. Therefore things like log_level will depend on values arriving from the users application. Without the header the logs backend falls back on using the value applied at the root of the message, which is usually a value of info or error set by the datadog agent dependent on whether the log was emitted via stdout/stderr.

The remediation is to have Vector apply the HTTP header, however conditionally. This will occur only if the event had originated from the datadog_agent and if the user hadn't applied any transforms to remove or modify reserve attributes in a non standard way.

Vector will partition events on the two aformentioned conditions, events which do not need these conditions will still be sent to datadogs logs backend however without the DD-PROTOCOL: agent-json header applied.

Change Type

Bug fix
New feature
Non-functional (chore, refactoring, docs)
Performance

Is this a breaking change?

Yes
No

How did you test this PR?

Tested by sending data to datadog from a service I had created that wrote messages to a local file that looked like this:

{"level": "info", "message": "This is sample message: 50", "time": 1742513705.3499498}
{"level": "error", "message": "This is sample message: 72", "time": 1742513706.355413}
{"level": "warn", "message": "This is sample message: 10", "time": 1742513707.3605719}

Before the change it can be seen that all of the logs in datadog have the status of info and after the change the UI shows logs of type info, error and warn, respecting the level field.

Does this PR include user facing changes?

Yes. Please add a changelog fragment based on our guidelines.
No. A maintainer will apply the "no-changelog" label to this PR.

Checklist

Please read our Vector contributor resources.
- make check-all is a good command to run locally. This check is
  defined here. Some of these
  checks might not be relevant to your PR. For Rust changes, at the very least you should run:
  - cargo fmt --all
  - cargo clippy --workspace --all-targets -- -D warnings
  - cargo nextest run --workspace (alternatively, you can run cargo test --all)
If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
run dd-rust-license-tool write to regenerate the license inventory and commit the changes (if any). More details here.

References

#13291

bruceg

Makes sense to me.

src/sinks/datadog/logs/sink.rs

bruceg · 2025-03-21T00:15:09Z

changelog.d/datadog_agent_http_header.fix.md

+Fix bug in datadog sink where data sent originating from the datadog agent is not 
+interpreted by the backend as an agent message.


Suggested change

Fix bug in datadog sink where data sent originating from the datadog agent is not

interpreted by the backend as an agent message.

Fix bug in the `datadog_logs` sink where data sent originating from the Datadog Agent is not

interpreted by the Datadog backends as an Agent message.

Agree with all of this except not sure "backend" should be pluralized to "backends."

graphcareful · 2025-03-21T13:49:15Z

I think this change is also missing one component. We must also verify the message originated from the datadog agent source

pront · 2025-03-21T13:55:59Z

src/sinks/datadog/logs/sink.rs

+    /// Evaluates the payload as conforming to the agent standard if the following 7 attributes
+    /// exist and also adhere to the expected types. All types are expected to be strings with the
+    /// exception of timestamp which should be an integer.


This doesn't feel like a robust predicate. Did you discuss any alternative? Without thinking too much about it, could you set a %"ddprotocol": "agent-json property at the source level and then just inspect it at the sink level?

Would agree that it's not terribly robust but not sure we have a better alternative. My understanding is this should catch cases where the log came in from the Agent (with the header) but the payload has been manipulated in some way that it no longer conforms to the protocol.

I see, so this is more complicated:

Event should originate from the DD agent source

Event hasn't been manipulated - keys/structure only or values?

For (1), we can probably use source_id from the EventMetadata cc @graphcareful
For (2), I suppose we don't care about original values. Only the event structure.

What happens if the event structure was modified but also the agent-json header was set in the request? Will the backend ingest it properly?

I think it should be best effort, e.g. if a log originated in DD agent we apply the protocol and let the backend ingest it. If the event doesn't conform because the user changed the format, then it's a user error. But if we change the format implicitly, then that's a bug.

What happens if the event structure was modified but also the agent-json header was set in the request? Will the backend ingest it properly?

@brent-hronik had replied to this in other threads, his reply was:

Parsing of the log would behave unexpectedly, some fields would default, others would just be missed.

I do think theres merit to just unconditionally applying the header but @brent-hronik has reservations that this would introduce complex unintuitive behavior across these services, which I also agree with too. I partitioned events by ones that will syntactically match agent events vs ones that wouldn't to try and address some of his concerns in an effort to make a best-of-both worlds solution.

- When routing datadog_agent logs through Vector before sending to the datadog logs intake, users have complained that they lose contextual log level information. - The cause was determined to be a header `DD-PROTOCOL: agent-json`, the agent prepares HTTP requests to the logs backend with this header while Vector does not. - This header lets the logs intake take precedence of the attributes that are nested within the 'message' attribute. Therefore things like log_level will depend on values arriving from the users application. Without the header the logs backend falls back on using the value applied at the root of the message, which is usually a value of info or error set by the datadog agent dependent on whether the log was emitted via stdout/stderr. - The remediation is to have Vector apply the HTTP header, however conditionally. This will occur only if the event had originated from the datadog_agent and if the user hadn't applied any transforms to remove or modify reserve attributes in a non standard way. - Vector will partition events on the two aformentioned conditions, events which do not need these conditions will still be sent to datadogs logs backend however without the `DD-PROTOCOL: agent-json` header applied.

brent-hronik · 2025-03-21T15:40:21Z

src/sinks/datadog/logs/sink.rs

+    /// Evaluates the payload as conforming to the agent standard if the following 7 attributes
+    /// exist and also adhere to the expected types. All types are expected to be strings with the
+    /// exception of timestamp which should be an integer.
+    fn event_conforms_to_agent_payload(item: &Event) -> bool {


I still think it would be preferable to make the changes necessary to restrict edits in OPW to only the message content. Exposing the full agent payload to users risks leaking internal data, for example, if a user tries to update the status, they'd need to know to update the nested status inside message, not the top-level one. This creates room for confusion and increases the chance of syntactically valid but semantically incorrect data.

For sure there are customers that want to view some of this metadata I'm sure of at least the DDTAGS field. Furthermore I believe some customers even want to modify the contents of that header. If we restrict this it would break the workflow of some users.

I agree it isn't ideal as your example still shows there is room for user error. However today without this change the system behaves in an unclear manner when the user is using it in a way that it is intended to be used.

pront

Will take another look when this #22701 (comment) is resolved.

graphcareful · 2025-03-21T21:00:21Z

Will take another look when this #22701 (comment) is resolved.

That actually has been implemented here

pront · 2025-03-25T14:31:25Z

src/sinks/datadog/logs/sink.rs

+    }
+
+    fn attr_is_bytes(log: &LogEvent, attr: &str) -> bool {
+        Self::attr_is(log, attr, Value::is_bytes)


Nit: These helper functions don't need to part of the EventPartitioner right?

pront · 2025-03-25T14:35:00Z

src/sinks/datadog/logs/sink.rs

+
+#[derive(Debug, Clone, PartialEq, Eq, Hash)]
+pub struct PartitionKey {
+    conforms: bool,


Suggested change

conforms: bool,

conforms_to_agent_json: bool,

graphcareful requested a review from a team as a code owner March 20, 2025 23:42

github-actions bot added the domain: sinks Anything related to the Vector's sinks label Mar 20, 2025

graphcareful force-pushed the datadog-logs-http-header branch from a659211 to bdd8f95 Compare March 20, 2025 23:44

graphcareful requested review from bruceg, pront and tessneau March 20, 2025 23:45

graphcareful changed the title ~~fix(datadog): Apply agent-json header on events from agent~~ fix(datadog_logs sink): Apply agent-json header on events from agent Mar 20, 2025

bruceg approved these changes Mar 21, 2025

View reviewed changes

pront reviewed Mar 21, 2025

View reviewed changes

graphcareful added 2 commits March 21, 2025 11:30

Add changelog file

6d33b4c

graphcareful force-pushed the datadog-logs-http-header branch from bdd8f95 to 6d33b4c Compare March 21, 2025 15:30

brent-hronik reviewed Mar 21, 2025

View reviewed changes

pront requested changes Mar 21, 2025

View reviewed changes

pront self-requested a review March 25, 2025 14:28

pront reviewed Mar 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(datadog_logs sink): Apply agent-json header on events from agent #22701

fix(datadog_logs sink): Apply agent-json header on events from agent #22701

graphcareful commented Mar 20, 2025 •

edited

Loading

bruceg left a comment

bruceg Mar 21, 2025

sghall Mar 21, 2025

graphcareful commented Mar 21, 2025

pront Mar 21, 2025

sghall Mar 21, 2025

pront Mar 21, 2025 •

edited

Loading

graphcareful Mar 21, 2025 •

edited

Loading

brent-hronik Mar 21, 2025

graphcareful Mar 21, 2025

pront left a comment

graphcareful commented Mar 21, 2025

pront Mar 25, 2025

pront Mar 25, 2025

		Fix bug in datadog sink where data sent originating from the datadog agent is not
		interpreted by the backend as an agent message.

fix(datadog_logs sink): Apply agent-json header on events from agent #22701

Are you sure you want to change the base?

fix(datadog_logs sink): Apply agent-json header on events from agent #22701

Conversation

graphcareful commented Mar 20, 2025 • edited Loading

Summary

Change Type

Is this a breaking change?

How did you test this PR?

Does this PR include user facing changes?

Checklist

References

bruceg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

graphcareful commented Mar 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pront Mar 21, 2025 • edited Loading

Choose a reason for hiding this comment

graphcareful Mar 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pront left a comment

Choose a reason for hiding this comment

graphcareful commented Mar 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

graphcareful commented Mar 20, 2025 •

edited

Loading

pront Mar 21, 2025 •

edited

Loading

graphcareful Mar 21, 2025 •

edited

Loading