Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for max_event_buffer_size: the maximum number of bytes from the final events payload sent #35299

Open
kebroad opened this issue Sep 19, 2024 · 14 comments

Comments

@kebroad
Copy link

kebroad commented Sep 19, 2024

Component(s)

exporter/splunkhec

Describe the issue you're reporting

Upon using this exporter we have noticed that when one of the max_content_length_ configurations is set (ie. max_content_length_logs), the size of the payload on the wire may not be consistent depending on the disable_compression setting.

When true, the write function will return an error if its over capacity, whereas the compression write function seems like it will write it if its compressed size is under the max_content_length.

So, if I understand it correctly, when max_content_length_logs is 1MB, the uncompressed writer will ensure its raw size is under 1MB, but the gzip writer will ensure its compressed size is under 1MB.

Is this intentional? We are exporting to an endpoint that has size limits on the uncompressed data but, while we want to compress it, with variable compression rates its hard to determine exactly what content length limit we would want to configure.

@kebroad kebroad added the needs triage New item requiring triage label Sep 19, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@bderrly
Copy link
Contributor

bderrly commented Sep 19, 2024

The part that was most surprising here was that the documentation led us to believe that the compression was going to be just before being sent over the wire. Thus our expectation was that that data buffer would be capped at the max_content_length_logs, in our case 5 MB, and then compressed and sent over the wire. As mentioned above, our log destination (Logscale) is expecting a maximum uncompressed payload of 5 MB.

From README.md:

disable_compression (default: false): Whether to disable gzip compression over HTTP.

@atoulme
Copy link
Contributor

atoulme commented Sep 19, 2024

The Content-Length header value is the size of the payload on the wire, not the uncompressed content.

If you want to configure the size of the uncompressed payload, we will need more work to support that.

@kebroad
Copy link
Author

kebroad commented Sep 19, 2024

If you want to configure the size of the uncompressed payload, we will need more work to support that.

Yes, this is essentially what we want. I wouldn't mind working on a PR for this. Would you want to add another configuration field for this use case or change the existing behavior of the max_content_length_ fields in the buffer writers, or something else?

@atoulme
Copy link
Contributor

atoulme commented Sep 19, 2024

I think that'd be something such as "payload_size". Note we also have a "max_event_size" configuration key. Those settings might play on each other. Do you want to have different sizes per signal, such as a key for log, metrics and traces? Or just one?

@kebroad
Copy link
Author

kebroad commented Sep 19, 2024

We are currently only using this exporter for logs, so one setting for all three would be fine.

Im thinking we could perhaps have a field like max_content_length_type, which could be compressed or raw.
compressed would be the equivalent of what we currently have, ie. if max_content_length_logs: 5000000, that would mean that the size of the compressed payload would have to be under 5MB. If it was raw at max_content_length_logs: 5000000, it would mean that the raw size of the payload must be under 5MB, which is then compressed to some lower size, then sent over the wire.

max_content_length_type would have no effect if disable_compression: true

@crobert-1
Copy link
Member

Removing needs triage based on code owner response. From what I understand, this is a valid enhancement request, but still may need some more discussion to iron out configuration and implementation details.

@crobert-1 crobert-1 removed the needs triage New item requiring triage label Sep 20, 2024
@atoulme
Copy link
Contributor

atoulme commented Sep 25, 2024

No, please do not use content_length in your field. Content-Length is a HTTP header used to represent the size of the payload in bytes over the wire. This is important for middleware like Nginx. This request for enhancement is not tied to this HTTP header.

@kebroad
Copy link
Author

kebroad commented Sep 25, 2024

Ah, I apologize, I did not connect the dots that content_length literally means the Content-Length header/content length on the wire.

you mentioned max_event_size which is the following:

max_event_size (default: 5242880): Maximum raw uncompressed individual event size in bytes. Maximum allowed value is 838860800 (~ 800 MB).

what do you think about a max_event_buffer_size, which would look something like this:

max_event_buffer_size: Maximum raw uncompressed event total buffer size in bytes. Default value is 2097152 bytes (2 MiB). Maximum allowed value is 838860800 (~ 800 MB)

This would be adopting the defaults/max values from the other content_length fields.

@atoulme atoulme changed the title max_content_length_* is not consistent on the wire compressed vs. uncompressed Implement support for max_event_buffer_size: the maximum number of bytes from the final events payload sent Nov 8, 2024
@atoulme
Copy link
Contributor

atoulme commented Nov 8, 2024

We can use that setting to complete the approach ; initially, it should not have a default value so as to not introduce a breaking change.

@kebroad
Copy link
Author

kebroad commented Nov 11, 2024

@atoulme after a bit more thinking, the best way to approach this could be to have an enforce_length_restriction_uncompressed config field.

See this draft PR implementing it and corresponding description:

  • enforce_length_restriction_uncompressed (default: false): Whether to enforce restricting the uncompressed
    length of the payload by the maximums fields set above. When true, the maximum content lengths will signify the
    maximum length of the payload both compressed and/or uncompressed (although uncompressed should always be the upper bound). Use this when the endpoint you are exporting to requires a certain payload maximum after decompression.

The aformentioned "fields set above" being max_content_length_logs, max_content_length_metrics, max_content_length_traces.

What do you think about this proposal? It seems to be a bit cleaner and simpler.

@bderrly
Copy link
Contributor

bderrly commented Nov 14, 2024

I think it would be ideal if the configuration for this exporter was changed such that "batch size" meant uncompressed size and was separated in meaning from the "content-length" HTTP header. See vector or fluentbit chunks for other examples. The use of compression should be simply a decision about the compression of the HTTP payload.

@bderrly
Copy link
Contributor

bderrly commented Nov 14, 2024

Let me add some additional context to my prior comment as I do not want to give the impression I want to change the scope of this issue. I think this issue should proceed as it is in order to help speed along the meeting our needs.

My comment was meant to express that a future breaking change that alters the configuration parameters might be prudent. I think semantically linking the batch size to the HTTP content-length makes it more difficult to reason about this problem. As an administrator I only care about the size of the payload uncompressed. Compression is beneficially to me in order to reduce cost over the wire. I believe these are distinct concerns and should not be conflated into a single configuration parameter.

@atoulme
Copy link
Contributor

atoulme commented Dec 7, 2024

I disagree, as the content size of the packet on the network is what operations teams will care about when it comes to managing data input. The exporter is used in production and we will not introduce breaking changes to the configuration at this time. See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/component-stability.md#beta and https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/splunkhecexporter/metadata.yaml#L6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants