-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for exemplars in metrics UI #4629
Conversation
When you have a lot of data points all on top of one another its going to get hard to select the right one. Are there any user interface patterns for solving this? I know in some CAD systems (which also have the problem of clusters of points/planes/etc overlapping) is to click-hold and get a context menu with all the nearby points of interest to then select. |
I don't know if the chart gives that level of control. The idea of exemplars seems like they're a way to access an example of data. It seems inevitable that two trace points with almost the same value and almost the same time are on the graph. If you want to be able to access every trace, the trace view provides UI for that. Alternatively, the metrics table view could be updated to provide links to all exemplars in a list. I haven't touch the metrics table view yet. |
The point of exemplars is to provide a sampling of values, and not on every request. The OTel API has a way for specifying the sampler. We should probably setup this in ServiceDefaults to have a high sample rate during debug, and more sensible for production. |
That is true. The default in OTel .NET SDK is to collect at most 1 Exemplar per collection for Non-Histograms, and at most N Exemplars per collection for Histograms, where N is the number of histogram buckets. UI's can further filter the Exemplars to make UI less cluttered, esp. when multiple instances are reporting Exemplars.. |
What do you think about defaulting exemplars to on in production? Good or bad? Docs say opentelemetry-dotnet defaults them to off, against recommended conventions, because of performance. Do you have more details to help us make a decision? |
Having it on-by-default for histograms seems very useful (and the cost seems reasonable). For counters/gauges, I don't know how many users would want to use exemplars.
Check this open-telemetry/opentelemetry-specification#3870 (comment) |
@JamesNK Just FYI in case you haven't noticed in .NET we have a special feature to enable exemplars just for histograms. Look for |
I'm wondering if we should be going with Trace-based exemplars, and adding something smarter for the trace sampling, which currently defaults to 100%. The overhead to metrics for the exemplar is large compared to typical aggregated metric data, but should be small in relation to the payload for each trace. We probably have too high a default trace collection rate for production usage. |
@JamesNK thanks for putting this together so quickly..looks great! Did some initial testing around this today and from a data perspective things look great. A few things we would need to review with the UX board on :
Additionally from an accessbility perspective, how do we envision the data being displayed on the table view? Would the user need to filter by http.route to display the link to the span element? Please add an issue for tracking the display in table view. |
@CodeBlanch Optimizing for performance isn't a large concern in development. I think we can set
@samsp-msft Aspire sets trace sampling to off in development (i.e. collect all traces). After deployment, it uses whatever is the opentelemetry-dotnet default is for trace sampling. |
@cijothomas Thanks for the info! I think some UI filtering will be needed. For example, if someone sets the metrics chart timespan to one day, there could be a huge number of exemplars across that time. Thousands of points squeezed onto a graph won't be very useful. Do you know what strategies they use? Options I can think of:
I think an even distribution would be more usable and work better in a graph updating in realtime. |
b094632
to
22fc06c
Compare
Issue for table view: #4639 Can you start UX review process started and CC me? I haven't done it before so I'd like to learn at the same time. |
All remaining work is done. Please review. |
Its really cool to have exemplars included in there 👍 As someone experienced in the area I can make a good guess what the UI is showing me, but I am wondering if less experienced users will also understand it. This is my guess at a novice user thinking process:
Asking some less experienced users is probably better than my own guesses though. If these guesses are right I'm no UI expert but I can offer one suggestion that might help: Hovering over or clicking on the "Exemplars" link could give more info about exemplars. Hovering over each diamond would give info about all the exemplars available near that point in time. The hover info could be a small summary like "4 example traces" and then clicking on it goes to new page where each exemplar is a table row with the metric value and clickable links to each trace. Alternately if you can do a little table with links inside the popup then you might be able to show all the exemplars inline without needing an intermediate page.
I would think its good as long as the sampling rate on the exemplars gives a total data volume that is a small fraction of what is being used for traces. @reyang @cijothomas, do the exemplar sampling controls let people produce that result even if it is just approximate? |
I think that definitely helps. I still hope we can have some grouping or filtering to avoid dots that are spatially too close together. |
+1 on grouping - when the dots are getting too close, collapse them similar to what a power outage map would do, and allow user to drill down: |
+1 - the value of exemplars is to be able to look at the outliers and see why they are different from the others. Showing them with a vertical position makes it much easier to understand where that exemplar lives in relation to the overall values.
If there are too many close together, that probably says you have too many values. I am less concerned about being able to select an individual one on that basis, especially if we have the table view, as you then get the list you are after. If there are a bunch close together, then we could take you to the table view - but how to take you to a specific set of entries and highlight them is a new UX affordance we don't yet have. |
If a counter has a sudden spike, the exemplars presumably all still have value 1. It just happens that there were a lot of counter increments in that time period. This strategy works for histograms but not for all metric types.
I agree, I'm also not that concerned about selecting specific exemplars. I'm more concerned that we shouldn't show too many exemplar data points in the UI if they are overly difficult to select or if the large number of them causes too much visual clutter. |
Please review 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @JamesNK @samsp-msft would it make sense for two of you to show a demo to the OpenTelemetry community during the community/maintainers meeting? I think it will help to raise the awareness and also collect feedback from users.
What time is it? Might not work for me. Ping us details on teams/email. |
I think the table is very nice 👍 Is there anything that will prevent the graph from becoming cluttered with many exemplars if the app is running under sustained load? It seemed like if P50 stayed low but not zero it might get obscured behind lots of exemplar dots all concentrated near the x-axis. |
[heart] Kay Venkatrajan reacted to your message:
…________________________________
From: James Newton-King ***@***.***>
Sent: Wednesday, June 26, 2024 7:43:00 AM
To: dotnet/aspire ***@***.***>
Cc: Kay Venkatrajan ***@***.***>; Review requested ***@***.***>
Subject: Re: [dotnet/aspire] Add support for exemplars in metrics UI (PR #4629)
@adamint<https://github.com/adamint> @noahfalk<https://github.com/noahfalk> To kill two birds with one stone, I've added exemplars to the metrics table view. It is both accessible and provides a UI that doesn't have the issue of many points at the same location.
exemplars-table.gif (view on web)<https://github.com/dotnet/aspire/assets/303201/375519a1-5774-468e-8c74-ee5b68d7fc96>
—
Reply to this email directly, view it on GitHub<#4629 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AYQC2VVVHXUKBNWJBM4T2HLZJJWIJAVCNFSM6AAAAABJZQ5AGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJRGAZTGNZUGQ>.
You are receiving this because your review was requested.Message ID: ***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My approval is scoped to the app model side of things (very minor changes ... no concerns). Only thing I would suggest is regenerating the sample manifests under playground.
I just added some sampling to the graph. It displays up to 20 points per tick. There are 30ish ticks which means approx 600 points max on a graph. The main purpose is to stop someone slamming an app with load and the graph trying to render thousands of points, killing browser performance.
Nothing stops that. Graph view gives a visual overview and allows someone to see and click on data outliers. The table view is the place to look if you're interested in a particular span. |
Yes, can do, when is it? |
UI feedback: any way to padd numbers and/or align the visuals better? when icons are shown it's better they are more uniform aligned in table views rather than appear to shift position. Similarly (perhaps more nuanced because the number of exemplars changes) if the button for exemplars could be more uniform-width? |
I can give the exemplar button a minimum width so an exemplar number between 1-999 has a consistent width. But at some point the button must grow to handle large numbers, so it won't always be consistent. The icon placement is more difficult, and I won't touch in this PR. We tried lining them up in the past and depending on the data, it looked worse than the current layout. |
You could right align the cells. That would make the icons line up at least. |
is the number always a 3 decimal value? e.g. instead of |
@JamesNK can you add the changes to the ServiceDefaults project to enable exemplars, and setting the configuration to enable it just for histograms so it matches with the dashboard implementation. |
98b6b6e
to
0c7b0d5
Compare
0c7b0d5
to
dfcd72a
Compare
I think leaving it at 16 is good. For reference, 16 is the size of most icons throughout the app. |
I'm going to merge as is to unblock other work, and we can bike shed marker sizes and make the change in a follow up PR. |
[celebrate] Kay Venkatrajan reacted to your message:
…________________________________
From: James Newton-King ***@***.***>
Sent: Friday, July 12, 2024 1:44:21 AM
To: dotnet/aspire ***@***.***>
Cc: Kay Venkatrajan ***@***.***>; Review requested ***@***.***>
Subject: Re: [dotnet/aspire] Add support for exemplars in metrics UI (PR #4629)
Merged #4629<#4629> into main.
—
Reply to this email directly, view it on GitHub<#4629 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AYQC2VUB6EDSOCMCULGTNSLZL4YHLAVCNFSM6AAAAABJZQ5AGKVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJTGQ4DAMBUGIZTOOA>.
You are receiving this because your review was requested.Message ID: ***@***.***>
|
commit 9071c84 Author: Ankit Jain <[email protected]> Date: Fri Jul 12 19:04:26 2024 -0400 fix e2e commit 68a7ae0 Merge: 09acf2d 681f2e7 Author: Ankit Jain <[email protected]> Date: Fri Jul 12 18:57:33 2024 -0400 Merge remote-tracking branch 'origin/main' into tests-out-of-repo commit 681f2e7 Author: Dan Moseley <[email protected]> Date: Fri Jul 12 14:04:57 2024 -0600 Make credscan run clean (dotnet#4876) * suppressions * more * typo commit a133411 Author: Dan Moseley <[email protected]> Date: Fri Jul 12 08:42:40 2024 -0600 add TSA config (dotnet#4860) * tsaconfig * more * remove the config per Matt * add policheck in 1ES template * policheck exclusions commit cf61a58 Author: James Newton-King <[email protected]> Date: Fri Jul 12 13:00:02 2024 +0800 Load plotly as a module on metrics page (dotnet#4857) commit 09acf2d Author: Ankit Jain <[email protected]> Date: Fri Jul 12 00:34:04 2024 -0400 fix archive path for e2e commit da382ff Merge: 18b92c3 de35684 Author: Ankit Jain <[email protected]> Date: Fri Jul 12 00:12:02 2024 -0400 Merge remote-tracking branch 'origin/main' into tests-out-of-repo commit 18b92c3 Author: Ankit Jain <[email protected]> Date: Fri Jul 12 00:07:52 2024 -0400 merge from playground branch commit de35684 Author: James Newton-King <[email protected]> Date: Fri Jul 12 09:43:58 2024 +0800 Add support for exemplars in metrics UI (dotnet#4629) commit e23cd4c Merge: ef3158b 9421fc8 Author: Ankit Jain <[email protected]> Date: Thu Jul 11 18:00:20 2024 -0400 Merge remote-tracking branch 'origin/main' into tests-out-of-repo commit ef3158b Author: Ankit Jain <[email protected]> Date: Thu Jul 11 03:27:52 2024 -0400 wip commit 7c61df1 Author: Ankit Jain <[email protected]> Date: Thu Jul 11 02:48:20 2024 -0400 fixies mores commit d6e21ab Author: Ankit Jain <[email protected]> Date: Thu Jul 11 03:10:21 2024 -0400 share targets commit 01e419d Merge: 0741157 421f8b1 Author: Ankit Jain <[email protected]> Date: Thu Jul 11 03:00:16 2024 -0400 Merge remote-tracking branch 'origin/main' into tests-out-of-repo # Conflicts: # tests/helix/send-to-helix-inner.proj commit 0741157 Merge: 3406514 f647a66 Author: Ankit Jain <[email protected]> Date: Tue Jul 9 18:43:36 2024 -0400 Merge remote-tracking branch 'origin/main' into tests-out-of-repo commit 3406514 Author: Ankit Jain <[email protected]> Date: Mon Jul 8 21:18:23 2024 -0400 fix build commit 3489ff3 Author: Ankit Jain <[email protected]> Date: Mon Jul 8 20:23:10 2024 -0400 fix props generation commit 6a6315f Author: Ankit Jain <[email protected]> Date: Mon Jul 8 20:14:53 2024 -0400 fix commit 92642ea Author: Ankit Jain <[email protected]> Date: Mon Jul 8 19:36:17 2024 -0400 rename commit 4bea9f4 Author: Ankit Jain <[email protected]> Date: Mon Jul 8 19:21:36 2024 -0400 cleanup # Conflicts: # tests/Aspire.EndToEnd.Tests/Aspire.EndToEnd.Tests.csproj # tests/Shared/RepoTesting/Aspire.Testing.Repo.targets
Fixes #336
Fixes #4639
tldr example tracing data now shows up on metrics graphs as points. Click on a point to navigate to the trace UI. The $5 name for this feature is exemplars.
Done:
Todo:
Someone can click on an exemplar to view the trace detail before the trace or span is received by the dashboard. This primarily happens because OTEL includes exemplars with metrics immediately but doesn't send a span until it is finished. What should happen? One option is a loading indicator + message somewhere.Dialog is displayed on metrics page that data isn't ready. Navigates when data is ready, or can be canceled by user.By default, opentelemetry-dotnet doesn't send exemplars with metrics for performance reasons. We can either enable this feature by updating the service defaults with new C# configuration or set an OTEL env var in Aspire. I think the OTEL env var should always be set in local development, and I'm undecided about defaulting to on in service defaults. It depends whether we think this should be on in production.Set env var in developmentDemo:
Microsoft Reviewers: Open in CodeFlow