Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NET9 HttpClient isn't following the OTel spec for some attributes #109847

Open
TimothyMothra opened this issue Nov 14, 2024 · 14 comments
Open

NET9 HttpClient isn't following the OTel spec for some attributes #109847

TimothyMothra opened this issue Nov 14, 2024 · 14 comments

Comments

@TimothyMothra
Copy link

TimothyMothra commented Nov 14, 2024

Description

This is a follow up to #104251.
I'm working on the OpenTelemetry.Instrumentation.Http library, which should not overwrite attributes set by NET9's runtime: open-telemetry/opentelemetry-dotnet-contrib#2314

I found two examples where runtime isn't following the OTel spec:

  1. URL Fragement Identifier
    OTel Spec (link): "[4]: For network calls, URL usually has scheme://host[:port][path][?query][#fragment] format, where the fragment is not transmitted over HTTP, but if it is known, it SHOULD be included nevertheless."

    Runtime is NOT recording the Fragment in url.full.

  2. HTTP Method
    OTel Spec (link): "[1]: HTTP method names are case-sensitive and http.request.method attribute value MUST match a known HTTP method name exactly. Instrumentations for specific web frameworks that consider HTTP methods to be case insensitive, SHOULD populate a canonical equivalent. Tracing instrumentations that do so, MUST also set http.request.method_original to the original value."

    Runtime is treating the HTTP Method as case-insensitive and not setting http.request.method_original (ie: "Get" vs "GET")

Reproduction Steps

CSPROJ

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net9.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="OpenTelemetry" Version="1.10.0" />
    <PackageReference Include="OpenTelemetry.Exporter.Console" Version="1.10.0" />
    <PackageReference Include="OpenTelemetry.Instrumentation.Http" Version="1.9.0" />
  </ItemGroup>

</Project>

PROGRAM.CS

namespace ConsoleApp4
{
    using OpenTelemetry;
    using OpenTelemetry.Trace;

    internal class Program
    {
        static async Task Main(string[] args)
        {
            using var tracerProvider = Sdk.CreateTracerProviderBuilder()
                .AddSource("System.Net.Http")       // Use this to test native NET9 instrumentation
                //.AddHttpClientInstrumentation()   // Use this to test OpenTelemetry.Instrumentation.Http
                .AddConsoleExporter()
                .Build();

            using var client = new HttpClient();
            var requestMessage = new HttpRequestMessage(new HttpMethod("Get"), "https://en.wikipedia.org/wiki/.NET#History");
            var response = await client.SendAsync(requestMessage);

            Console.WriteLine($"Response status code: {response.StatusCode}");

            tracerProvider.Dispose();
        }
    }
}

Expected behavior

Using the OpenTelemetry.Instrumentation.Http instrumentation:

Activity.TraceId:            eefbe518c63a5e46a2ce0ce7a5174f8f
Activity.SpanId:             e3c2ddba04b97629
Activity.TraceFlags:         Recorded
Activity.DisplayName:        GET
Activity.Kind:               Client
Activity.StartTime:          2024-11-14T21:08:44.4819233Z
Activity.Duration:           00:00:00.2718997
Activity.Tags:
    http.request.method: GET          <-- Shows 2, Recording the standard "GET" (all caps)
    server.address: en.wikipedia.org
    server.port: 443
    url.full: https://en.wikipedia.org/wiki/.NET#History          <-- Shows 1, Recording the "#History" fragment
    http.request.method_original: Get          <-- Shows 2, Recording the original "Get"
    http.response.status_code: 200
    network.protocol.version: 1.1
Instrumentation scope (ActivitySource):
    Name: System.Net.Http
Resource associated with Activity:
    telemetry.sdk.name: opentelemetry
    telemetry.sdk.language: dotnet
    telemetry.sdk.version: 1.10.0
    service.name: unknown_service:ConsoleApp4

Response status code: OK

Actual behavior

Using the NET9 native instrumentation:

Note that "http.request.method_original: Get" is missing.

Activity.TraceId:            46cd716242195da4f72318d6a5eb20f9
Activity.SpanId:             cf84fa8a9856eb57
Activity.TraceFlags:         Recorded
Activity.DisplayName:        GET
Activity.Kind:               Client
Activity.StartTime:          2024-11-14T21:10:49.3686726Z
Activity.Duration:           00:00:00.4011271
Activity.Tags:
    http.request.method: GET          <-- Shows 2, Recording the standard "GET". This is correct.
    server.address: en.wikipedia.org
    server.port: 443
    url.full: https://en.wikipedia.org/wiki/.NET          <-- Shows 1, the "#History" fragment is missing
    http.response.status_code: 200
    network.protocol.version: 1.1
Instrumentation scope (ActivitySource):
    Name: System.Net.Http
Resource associated with Activity:
    telemetry.sdk.name: opentelemetry
    telemetry.sdk.language: dotnet
    telemetry.sdk.version: 1.10.0
    service.name: unknown_service:ConsoleApp4

Response status code: OK

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response

@MihaZupan
Copy link
Member

Re: url.full, we redact fragments by default as well, you can opt-in via either of these:

AppContext.SetSwitch("System.Net.Http.DisableUriRedaction", true);
Environment.SetEnvironmentVariable("DOTNET_SYSTEM_NET_HTTP_DISABLEURIREDACTION", "1");

Re: method, we'll currently populate method_original only if the method is unknown, but we can make the change to do so if the casing doesn't match as well.

KeyValuePair<string, object?> methodTag = DiagnosticsHelper.GetMethodTag(request.Method, out bool isUnknownMethod);
activity.SetTag(methodTag.Key, methodTag.Value);
if (isUnknownMethod)
{
activity.SetTag("http.request.method_original", request.Method.Method);
}

@julealgon
Copy link

Re: url.full, we redact fragments by default as well, you can opt-in via either of these:

Oh no.... now there is another flag for this that we need to be aware too besides the OTEL redaction one? This is really suboptimal to say the least.

I still don't understand why the team decided to do this redaction by default 😞

@TimothyMothra
Copy link
Author

TimothyMothra commented Nov 15, 2024

Thanks for the reply @MihaZupan!

Re: url.full, I can confirm that when using AppContext.SetSwitch("System.Net.Http.DisableUriRedaction", true);, the fragment in my example above is recorded! 😄

Follow up question; is this the correct behavior?

I understand why query strings are redacted, and it's obvious when it happens because this is recorded with the asterisk "?*".
But it's not obvious to me when and why the fragment is redacted.

The PR that made this change seems to suggest that only query string would be affected: #103769 and the doc page for "DisableUriRedaction" only mentions query strings: https://learn.microsoft.com/en-us/dotnet/core/compatibility/networking/9.0/query-redaction-events

Additionally, from a casual search multiple specifications define the fragment as separate and distinct from the query.

  • OTel: (link): scheme://host[:port][path][?query][#fragment]
  • RFC 3986 (link): "The generic URI syntax consists of a hierarchical sequence of components referred to as the scheme, authority, path, query, and fragment. URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]"

@antonfirsov
Copy link
Member

antonfirsov commented Nov 15, 2024

why the fragment is redacted

(Edit: Originally copied the wrong text into the quote)

IIRC we did it to keep the implementation simple and fast. We may consider to change it in .NET 10.

I still don't understand why the team decided to do this redaction by default

We need defaults which are "safe enough" for most users. Accidentally emitting unredacted query values may lead to serious privacy incidents, we have customer evidence on this.

another flag for this that we need to be aware too besides the OTEL redaction one

I wonder which specific flags do you mean? Assuming the HttpClient built-in OTel would be feature-complete (meaning that eg. enrichment is implemented) do you still need to worry about the OTEL configuration?

@julealgon @TimothyMothra if there was an option to customize Uri redaction via a callback or by setting url.template, would you consider it to be a good-enough solution?

@julealgon
Copy link

julealgon commented Nov 15, 2024

why query strings are redacted

IIRC we did it to keep the implementation simple and fast. We may consider to change it in .NET 10.

I still don't understand why the team decided to do this redaction by default

We need defaults which are "safe enough" for most users. Accidentally emitting unredacted query values may lead to serious privacy incidents, we have customer evidence on this.

I can totally accept that you have evidence of people doing this, but even then I don't think it should be up to the framework to "protect" those people. People do all sorts of terrible things, like exposing credentials in source code, in logs, etc, but we don't suddenly start enforcing redaction by default in all logs and start throwing errors when the word "password" exists somewhere.

In this case, IMHO, devs should just know what the best practices are and follow them. If they expose something in a querystring, that's on them to suffer the consequences.

another flag for this that we need to be aware too besides the OTEL redaction one

I wonder which specific flags do you mean? Assuming the HttpClient built-in OTel would be feature-complete (meaning that eg. enrichment is implemented) do you still need to worry about the OTEL configuration?

I'm talking about these:

  • OTEL_DOTNET_EXPERIMENTAL_OWIN_DISABLE_URL_QUERY_REDACTION
  • OTEL_DOTNET_EXPERIMENTAL_HTTPCLIENT_DISABLE_URL_QUERY_REDACTION
  • OTEL_DOTNET_EXPERIMENTAL_ASPNETCORE_DISABLE_URL_QUERY_REDACTION
  • OTEL_DOTNET_EXPERIMENTAL_ASPNET_DISABLE_URL_QUERY_REDACTION

Which we already maintain on our side.

Now we have to also add this new DOTNET_SYSTEM_NET_HTTP_DISABLEURIREDACTION to the mix.

Additionally, keep in mind that companies sometimes have complex setups. We have a very large solution right now that is still targeting NET472. Some of the projects in that solution are actually stuck on .NET Framework land, as they are WebForms projects. This means we'll have a mix of framework targets for the foreseeable future as we start to migrate the rest of that solution to .NET8 and .NET9 soon. Thus, we'll have to live with all of these settings at the same time, since some projects cannot take advantage of the built-in HTTP tracing and have to keep using the OTEL instrumentation.

And even outside of this solution, we have many other smaller solutions already targeting NET8, so in effect we already have a mix of frameworks even before a migration of the legacy projects start to happen.

@julealgon @TimothyMothra if there was an option to customize Uri redaction via a callback or by setting url.template, would you consider it to be a good-enough solution?

That's what I proposed when I first saw the redaction changes in the OTEL libraries. Instead of forcing redaction like what you are doing now, you could just expose a simple redaction mechanism:

I even suggested using the Microsoft.Extensions.Compliance.Redaction abstractions for that:

And let consumers do what they want: if they want the redaction, they apply it. Making it the other way around is super intrusive, opinionated (in a bad way IMHO) and handholdy.

To make things even worse, all the OTEL packages at the time were marked as "vulnerable", which increased NuGet warning noise on our side. If writing a querystring is considered a "vulnerability", perhaps the Console.Write* methods should also be marked as vulnerable, since they don't have automatic redaction too.

I strongly dislike this direction, as you can probably tell by the above.

@MihaZupan
Copy link
Member

MihaZupan commented Nov 15, 2024

Do you know of a real-world example where you might care about the casing of the HTTP method specified, or is this just for the sake of following the spec to the letter even if there's no practical use to it?
With the .NET implementation that's instrumented (SocketsHttpHandler), we'll always normalize known ones to uppercase, so you can't send anything else on the wire anyway, so using new HttpMethod("Get") vs HttpMethod.Get makes no practical difference. If anything, the extra attribute could be a red herring telling the user that this is something that matters when it doesn't.

@TimothyMothra
Copy link
Author

Happy Friday and thanks everyone for sharing information!

Regarding HTTP Methods
Hi @MihaZupan,

Do you know of a real-world example where you might care about the casing of the HTTP method specified, or is this just for the sake of following the spec to the letter even if there's no practical use to it?

That's a good question, no I don't know of any real-world examples. I can ping some people in OTel who might know the reasoning behind this specific rule. I absolutely have my OTel-goggles on and trying to follow the spec to the letter. I'm also thinking about OTel users who would be upgrading to NET9 and then their exported attributes have changed.

Regarding Redaction
Hi @antonfirsov,

IIRC we did it to keep the implementation simple and fast. We may consider to change it in .NET 10.

Thanks for the reply. Please consider updating your docs to mention that the fragment is included in the redaction.

if there was an option to customize Uri redaction via a callback or by setting url.template, would you consider it to be a good-enough solution?

I expect some users would be very interested in a callback and I think this would be worth exploring.

Can you share a conversational example of how setting the url.template might affect redaction? I don't know enough about this to comment.

Redaction in general has been a divisive subject, and I think all parties involved are still trying to find the right balance. FYI the OpenTelemetry Spec recently merged a new rule naming specific query string keys that MUST be redacted by default (open-telemetry/semantic-conventions#971). NET 9 and OTel-dotnet are technically compliant because we both redact everything by default.

@TimothyMothra
Copy link
Author

HI @julealgon, I really appreciate all your input.
I agree that yet another environment variable is disappointing. You're seeing my surprise play out in real time!

I don't expect any major changes in the near future. But I'm committed to having this clearly and completely documented so users don't need to search for this information.

In this case, IMHO, devs should just know what the best practices are and follow them. If they expose something in a querystring, that's on them to suffer the consequences.

My understanding is that it's not only devs that suffer but also customers and/or end-users. The trend that I'm seeing is that technologies are moving to be secure by default to protect the entire industry/ecosystem.

Please continue sharing your insights and holding us accountable to do this in a way that's transparent and configurable!!!

@CarnaViire
Copy link
Member

Triage:
We should document the behavior
(Also: related to #93019)

@CarnaViire CarnaViire added the documentation Documentation bug or enhancement, does not impact product or test code label Nov 18, 2024
@CarnaViire CarnaViire added this to the 10.0.0 milestone Nov 18, 2024
@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Nov 18, 2024
@antonfirsov
Copy link
Member

Beyond documentation, it might worth to review the discussions that resulted in open-telemetry/semantic-conventions#971. However, working with an ever-changing deny list is very impractical from the instrumented library's perspective IMHO.

The trend that I'm seeing is that technologies are moving to be secure by default to protect the entire industry/ecosystem.

Exactly. There is an inherent tradeoff between observability and security. Our current guidelines require us to prioritize security.

Can you share a conversational example of how setting the url.template might affect redaction?

When implemented, it would the equivalent of the server-side http.route on the client. Given an url.full with value https://a.b/users/123?secret=42, a (user-configured, optional) template could be https://a.b/users/{id}?secret={secret}. We may decide to log the template for the url.full when the template is present, or you as the user may set up the redaction callback to return the template. This is an open topic, I asked the question out of curiosity, to collect inputs.

@antonfirsov antonfirsov removed the documentation Documentation bug or enhancement, does not impact product or test code label Nov 20, 2024
@antonfirsov
Copy link
Member

antonfirsov commented Nov 20, 2024

Removing the documentation label, because we should really consider to include fragment when emitting url.full for distributed tracing for OTel conformance. With tracing enabled, the user opts in into significant performace overhead anyways, so one extra allocation should not matter IMO.

@julealgon
Copy link

so one extra allocation should not matter IMO.

@antonfirsov can you elaborate what extra allocation(s) are needed here? Doesn't the original Uri object contain the entire url including the fragment portion?

@antonfirsov
Copy link
Member

Doesn't the original Uri object contain the entire url including the fragment portion?

The string behind Uri.AbsoluteUri is lazily created. The current implementation avoids instantiating it.

@antonfirsov
Copy link
Member

Opened #110018 to track the customizable uri redaction feature request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants