Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APM][Otel] Errors: Add fallback to span id if the parent id is undefined #195796

Conversation

jennypavlova
Copy link
Member

@jennypavlova jennypavlova commented Oct 10, 2024

Closes #195731

Summary

This PR fixes a bug with error correlations not displayed correctly in the APM waterfall when using Otel native data. In Otel native data parent.id is never defined so in this case we need to fallback to span.id

Testing

Warning

The changes of #195242 are required to properly test this. so there is a unit test to cover that before the PR is merged

Using the e2e PoC: navigate to a service overview - > Transactions and scroll to the waterfall:

  • Using both Otel native and APM server setups (the results should be the same: the error link should appear for both)
image

@jennypavlova jennypavlova added release_note:skip Skip the PR/issue when compiling release notes backport:skip This commit does not require backporting Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team 8.16 candidate labels Oct 10, 2024
@jennypavlova jennypavlova self-assigned this Oct 10, 2024
@jennypavlova jennypavlova requested a review from a team as a code owner October 10, 2024 15:12
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

@botelastic botelastic bot added the ci:project-deploy-observability Create an Observability project label Oct 10, 2024
Copy link
Contributor

@gregkalapos gregkalapos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't comment on overall Kibana changes, but the fallback logic looks ok 👍

@AlexanderWert
Copy link
Member

AlexanderWert commented Oct 10, 2024

@jennypavlova the other direction (from error to trace) is also broken, I see you covered the Waterfall here, would we need to cover the error detail view as well?

The link pointed out in the following screenshot does not appear for OTel-native errors:
image

OTel Native:

image

@elasticmachine
Copy link
Contributor

elasticmachine commented Oct 10, 2024

💔 Build Failed

Failed CI Steps

Test Failures

  • [job] [logs] x-pack/test_serverless/functional/test_suites/search/common_configs/config.group6.ts / discover/esql discover esql view switch modal should not show switch modal when switching to a data view while a saved search is open

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
apm 3.4MB 3.4MB +71.0B

History

cc @jennypavlova

| IWaterfallSpanOrTransaction
| undefined;
const parent = items.find(
(waterfallItem) => waterfallItem.id === (error.parent?.id ?? error.span?.id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jennypavlova have you considered moving this logic to the server instead of handling it on the client? I ask that because we'll probably have many hidden problems if we support multiple schemas on the client. As @AlexanderWert has already pointed out there's another place where the same logic is needed.

I wonder if the UI should be schema-agnostic and respect its own schema.

This is similar to a draft PR @rmyz opened recently #194100 (review).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment, @cauemarcondes
In general, I agree with you but this is just a fallback to a different field if the parent id is not there, I wouldn't call it a 'new' schema. It can be confusing if we modify the response and return the parent.id in a document that doesn't have it but it probably helps us in the long run as we will have the parent id available in all places in the UI. I remember that we discussed with @dgieselaar and @gregkalapos that we wanted to do a fallback on the client to keep it compatible and return the fields in the same format as they come from elasticsearch in the server.
But we can do it on the server, just need the changes made in #195242 as a base and merge a PR there

@AlexanderWert has already #195796 (comment) there's another place where the same logic is needed.

This is unrelated. I investigated more and this happens because the transaction.id is missing and the query and it is not returning the transaction object - even if I do what you suggested it won't fix the issue mentioned there

image

It's a different query (changed here) I am not sure why we are missing the transaction id in the error document - I see that also span id is not there (using the test env)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jennypavlova ahhh right, great finding!

I think it's not that easy (maybe impossible) to actually have the transaction.id on OTel errors, because it cannot be enriched from other attributes but would be required to be collected by the SDKs / APM Agents. That's a conceptual limitation. So, then I think let's proceed with this PR as is.

\cc @gregkalapos @felixbarny correct me if I'm wrong with the above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used span id for Otel errors - the PR is merged to #195242

@jennypavlova
Copy link
Member Author

@cauemarcondes @AlexanderWert I created a PR to Carlos' PR because I need the changes there as I followed the suggestion to add the fallback changes in the query response

cc: @crespocarlos I think only you can review/approve it

@jennypavlova
Copy link
Member Author

Closing in favor of crespocarlos#4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.16 candidate backport:skip This commit does not require backporting ci:project-deploy-observability Create an Observability project release_note:skip Skip the PR/issue when compiling release notes Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[APM] [Otel] Error correlation broken for OTel-native data
5 participants