Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service entity overview only shows logs that have log.level present. #189923

Closed
smith opened this issue Aug 5, 2024 · 26 comments · Fixed by #192500
Closed

Service entity overview only shows logs that have log.level present. #189923

smith opened this issue Aug 5, 2024 · 26 comments · Fixed by #192500
Assignees
Labels
bug Fixes for quality problems that affect the customer experience discuss Feature:ObsEntities Observability entity features Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team

Comments

@smith
Copy link
Contributor

smith commented Aug 5, 2024

#189389 specifies that we should hide logs that don't have a log.level attribute:

We require log.level vs any logs in the logs-* index to filter out a lot of noise from APM likely to relating failed transactions (to avoid duplication) and general unhelpful noise.

log.level is not marked as a required field in the Elastic Common Schema Log Fields specification. log.level is also not present in OpenTelemetry logs unless the field is added in the collector.

If we have a reason to filter out logs from APM errors (I'm not sure we do, other than "noise") we should explicitly filter out APM errors instead of requiring an arbitrary that happens to match.

✔️ Acceptance criteria

  • All of the entity views showing log data include logs both with and without log.level.
@smith smith added bug Fixes for quality problems that affect the customer experience discuss Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team Feature:ObsEntities Observability entity features labels Aug 5, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

@roshan-elastic
Copy link

This would be a good discussion. The goal with the 'log rate' and the log 'error rate' was to give the user an indication of throughput/traffic vs errors so figuring out how to try and present these metrics to the user in a consistent way between APM and logs-only services would be valuable.

Keen to hear people's thoughts.

I'd also like to see what we have running in practice with some customers if possible (or at the least, present to them and ask for their thoughts on the experience).

@roshan-elastic
Copy link

roshan-elastic commented Aug 23, 2024

@smith @iblancof @cauemarcondes

After a few calls this week with customers, the consensus is that:

  • We don't need to put additional filters for log.level on there in the Log Rate Chart (they're happy for the logs from APM to be counted too)

  • We should change the Log Error % to be Log Error Rate (so same as log rate but just for errors)

Proposed Changes

Log Rate : count() / [PERIOD_IN_MINUTES]
Log Error Rate : count(kql='log.level: "error" OR log.level: "ERROR" OR error.log.level : "error"') / [PERIOD_IN_MINUTES]

Any thoughts/opinions?

I'd love to make these changes in the service inventory and the service views as part of the current milestone for services if possible.

@iblancof
Copy link
Contributor

Hi @roshan-elastic,

I already added the spec in [APM][Services]Change log charts from Log Error % to Log Error Rate on overview #190987.

Regarding the service inventory column change I just created #191253, feel free to move it under a public epic or milestone if needed!

ℹ️ I mistakenly created the issue in the private repo, but I've already closed it.

@roshan-elastic
Copy link

Wkd @iblancof

Do you think we could create an issue to change the log rate to remove the need for log.level to be included?

@roshan-elastic
Copy link

Also, do we think we should change these on the service inventory in the same issue or should we break these out into separate issues?

@iblancof
Copy link
Contributor

Hi @roshan-elastic,

In summary, we want the Log Rate for both the inventory and the overview charts to not require having the log.level field populated as mandatory.

As for creating a separate issue for the service inventory and the overview charts, I think we can handle both things within this same issue we're discussing in, right?

My question is about the priority. Since we're focused on the adaptive view, when should we tackle this? Should it be part of the epic? If so, we should include it to clarify priorities and ensure everything is delivered together, even though the scope is different.

@roshan-elastic
Copy link

Hey @iblancof

In summary, we want the Log Rate for both the inventory and the overview charts to not require having the log.level field populated as mandatory.

Correct!

My question is about the priority. Since we're focused on the adaptive view, when should we tackle this?

Could we do this after we've finished the service view? Goal is that when customers see these new adaptive views - we have consistent log metrics in both the service inventory and the service view.

@iblancof
Copy link
Contributor

Going to add this issue as related to the Epic so we make sure we don' t forget about it.

@cauemarcondes
Copy link
Contributor

The action here is not removing the log.level, but rather what other field should we use, am I right? We currently use the log.level to define the log error rate. And we do it because AFAICT there's no standard log-level field.

Log Error Rate : count(kql='log.level: "error" OR log.level: "ERROR"') / [PERIOD_IN_MINUTES]

Like the suggested changes is still relying on the log.level field.

@roshan-elastic
Copy link

Hey @cauemarcondes,

Just to play back my understanding of the issue:

Log Rate
We would basically count all logs:

count() / [PERIOD_IN_MINUTES]

This one wouldn't need log.level at all.

Log Error Rate
Instead of showing a %, we'd show a 'rate' and also update the error to look at:

count(kql='log.level: "error" OR log.level: "ERROR" OR error.log.level : "error"') / [PERIOD_IN_MINUTES]

This one would need either log.level or error.log.level (done via APM I believe) to be set.

Make sense?

@cauemarcondes
Copy link
Contributor

Yes, that makes sense. Now I'm reading the comments again and I got what you meant. Sorry for the noise.

@roshan-elastic
Copy link

All good @cauemarcondes. Sorry, this whole thing has been a bit noisy because we're updating these based on user feedback we've only just received in the last week or so.

@cauemarcondes
Copy link
Contributor

@roshan-elastic We should also change the EEM data transforms as they also expect the log.level to exist to calculate the logRate. cc: @simianhacker / @klacabane

@klacabane
Copy link
Contributor

One issue is that an unfiltered count() would include the apm metrics documents so we still need to either identify what is a log or a metrics in the source indices documents. I think excluding metrics would be the easiest (eg metricset.name != "service_transaction" or metricset.name != "service_summary")

@roshan-elastic
Copy link

Hey @klacabane - would it include all of them if we specified to only match those from logs-*?

It's OK if they're APM logs writing to logs-apm-* etc.

@klacabane
Copy link
Contributor

klacabane commented Sep 3, 2024

would it include all of them if we specified to only match those from logs-*?

@roshan-elastic Yes, I think we have these options:

  • filter on data_stream.type: logs - we're sure to only get logs but we may miss log documents that don't have that property, not sure how common that is but may exist in filebeat-* indices
  • filter out the two apm metrics datasets included in the source - we count everything that's not a metric
  • filter on other property like _index, we can precisely target logs/filebeat but that would not be the most efficient

I think option 2 would give the most accurate results

@roshan-elastic
Copy link

roshan-elastic commented Sep 3, 2024

Cheers @klacabane.

I think we should make this as simple as possible for users to comprehend so I'm not sure about adding in exclusions vs specifying a simple inclusion.

As a user, I'm thinking that this metric would be calculated in the same way as this lens visualisation:

Filtering for logs
Image

Are we able to replicate this logic?

cc @iblancof as we'll want the same logic for showing the counts that we use in the service views

@iblancof
Copy link
Contributor

iblancof commented Sep 4, 2024

Thanks for sharing @roshan-elastic .

The Log Error Rate is being handled like this in this PR.
I expect the changes on the Log Rate to be handled in this issue we are commenting in.

@kpatticha
Copy link
Contributor

@klacabane should we create a ticket to update the logRate in the service entity definition ?

Happy to make the changes if you can provide the best approach to filter out the logs.

@klacabane
Copy link
Contributor

@kpatticha please feel free to update the definition. As for the best way to get the logRate I listed different approaches and I think both 1 and 2 should provide the results you're looking for but I didn't test that

@smith
Copy link
Contributor Author

smith commented Sep 9, 2024

Fixed by #191962.

@smith smith closed this as completed Sep 9, 2024
@iblancof
Copy link
Contributor

Hey @smith, as I mentioned in this comment, I don't believe this issue is resolved by PR #191962.

PR #191962 focuses on modifying the Log Error Rate chart and formula, while this issue is about ensuring that Logs are not filtered by log.level in the Log Rate.

@kpatticha kpatticha self-assigned this Sep 10, 2024
@kpatticha
Copy link
Contributor

@iblancof is right.

I'm assigning this to myself to update the entity definition and the query accordingly

@kpatticha kpatticha reopened this Sep 10, 2024
@kpatticha
Copy link
Contributor

I'm also removing the N/A badge when the log.level is not defined since it won't be valid any more

Image

FYI @roshan-elastic

@smith
Copy link
Contributor Author

smith commented Sep 10, 2024

Hey @smith, as I mentioned in #191962 (comment), I don't believe this issue is resolved by PR #191962.

Sorry I missed that @iblancof. Thanks for looking!

kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Sep 13, 2024
…lastic#192500)

## Summary

closes elastic#189923

- EEM: remove `log.level` filter and filter based on `data_stream.type:
logs`
- log-data-access: remove `log.level` filter
- ECO: Update the formula and Explore log filter and remove the N/A
badge

https://github.com/user-attachments/assets/9b190ba7-c28b-44cf-bb0b-6dbdfe47ced4

---------

Co-authored-by: kibanamachine <[email protected]>
(cherry picked from commit 48bbdd8)
kibanamachine added a commit that referenced this issue Sep 13, 2024
…tric collection (#192500) (#192854)

# Backport

This will backport the following commits from `main` to `8.x`:
- [[APM][ECO] Remove &#x60;log.level&#x60; filter from log rate metric
collection (#192500)](#192500)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT
[{"author":{"name":"Katerina","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-09-13T12:05:51Z","message":"[APM][ECO]
Remove `log.level` filter from log rate metric collection
(#192500)\n\n## Summary\r\n\r\ncloses
#189923 \r\n\r\n- EEM: remove
`log.level` filter and filter based on `data_stream.type:\r\nlogs`\r\n-
log-data-access: remove `log.level` filter\r\n- ECO: Update the formula
and Explore log filter and remove the
N/A\r\nbadge\r\n\r\n\r\nhttps://github.com/user-attachments/assets/9b190ba7-c28b-44cf-bb0b-6dbdfe47ced4\r\n\r\n---------\r\n\r\nCo-authored-by:
kibanamachine
<[email protected]>","sha":"48bbdd8a576522881e1af452cfaca8e529c258fa","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","ci:project-deploy-observability","Team:obs-ux-infra_services","v8.16.0"],"title":"[APM][ECO]
Remove `log.level` filter from log rate metric
collection","number":192500,"url":"https://github.com/elastic/kibana/pull/192500","mergeCommit":{"message":"[APM][ECO]
Remove `log.level` filter from log rate metric collection
(#192500)\n\n## Summary\r\n\r\ncloses
#189923 \r\n\r\n- EEM: remove
`log.level` filter and filter based on `data_stream.type:\r\nlogs`\r\n-
log-data-access: remove `log.level` filter\r\n- ECO: Update the formula
and Explore log filter and remove the
N/A\r\nbadge\r\n\r\n\r\nhttps://github.com/user-attachments/assets/9b190ba7-c28b-44cf-bb0b-6dbdfe47ced4\r\n\r\n---------\r\n\r\nCo-authored-by:
kibanamachine
<[email protected]>","sha":"48bbdd8a576522881e1af452cfaca8e529c258fa"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/192500","number":192500,"mergeCommit":{"message":"[APM][ECO]
Remove `log.level` filter from log rate metric collection
(#192500)\n\n## Summary\r\n\r\ncloses
#189923 \r\n\r\n- EEM: remove
`log.level` filter and filter based on `data_stream.type:\r\nlogs`\r\n-
log-data-access: remove `log.level` filter\r\n- ECO: Update the formula
and Explore log filter and remove the
N/A\r\nbadge\r\n\r\n\r\nhttps://github.com/user-attachments/assets/9b190ba7-c28b-44cf-bb0b-6dbdfe47ced4\r\n\r\n---------\r\n\r\nCo-authored-by:
kibanamachine
<[email protected]>","sha":"48bbdd8a576522881e1af452cfaca8e529c258fa"}},{"branch":"8.x","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Katerina <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience discuss Feature:ObsEntities Observability entity features Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants