Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request]: Update 'service.name' docs with additional guidance #4102

Closed
1 task done
Tracked by #189501 ...
roshan-elastic opened this issue Jul 31, 2024 · 14 comments
Closed
1 task done
Tracked by #189501 ...

[Request]: Update 'service.name' docs with additional guidance #4102

roshan-elastic opened this issue Jul 31, 2024 · 14 comments

Comments

@roshan-elastic
Copy link
Contributor

roshan-elastic commented Jul 31, 2024

Relates to:

Description

We need to add some additional content to the docs which explain how to declare a service in your logs:

https://www.elastic.co/docs/current/serverless/observability/add-logs-service-name

Changes required

  1. Explain that log.level needs to be present in order to display log metrics for a service in the new experience
  2. Document a potentially common use case which a user could come across

Background
In order to filter out unhelpful APM logs unrelated to logging (e.g. APM transaction errors) we are forcing the 'log rate' and 'log error %' metrics to require log.level in order to work.

Services Inventory - New Experience
image

Services View - New Experience
image

Specifically:

Log Rate
Rate of logs per minute observed for given service.name.

Formula Calculation:
count(kql='log.level: *') / [PERIOD_IN_MINUTES]

Log Error %
% of logs where error detected for given service.name.

Formula Calculation:
count(kql='log.level: "error" OR log.level: "ERROR"') / count(kql='log.level: *')

log.level isn't always provided automatically by our various ingestion methods (e.g. beats, elastic agent) so we need to provide some guidance to explain this and suggest how to do this.

Additionally, there is a likely common use case where the log.level is nested within a message and is therefore not used in our metrics. We would like to provide some guidance around this.

1. Explain that log.level needs to be present in order to display log metrics for a service in the new experience

Note : We will be updating the UI to point towards this documentation:

image

We should provide some guidance on how to declare log.level in your logs via Elastic Agent. For example, here is some documentation on how to do this for standalone:

https://www.elastic.co/guide/en/fleet/current/elastic-agent-standalone-logging-config.html

Perhaps there is more comprehensive guidance? I'll add our engineers as contacts in case they can help.

2. Document a potentially common use case which a user could come across

One potentially common use case is where users specify a service.name in container or kubernetes logs (although not exclusive to this use case). We should write a bit about this to give them an example of how to work around this:

image

Example cluster

In this case, the log relating to the service is nested within the message:

message:{"@timestamp":"2024-07-31T08:26:45Z","log.level":"info","message":"proxying API request to http://opbeans:3000"}

There are methods to decode this but they'll need to be careful that they don't override the existing log.level if they do.

We should make them aware of this use case and give them some guidance on how to pull out the encoded log.level and surface it in the main document (being careful not to overwrite the existing log.level).

Resources

Quick demo video:

services.inventory.-.new.experience.-.demo.mp4

Which documentation set does this change impact?

Stateful and Serverless

Feature differences

It's going to be available on both.

What release is this request related to?

8.16

Collaboration model

The documentation team

Point of contact.

Main contact: @roshan-elastic

Stakeholders: @cauemarcondes @kpatticha

@roshan-elastic
Copy link
Contributor Author

Hey @mdbirnstiehl this is the update I mentioned.

@bmorelli25 wondering if this update could be prioritised? It's relating to a bit of a problem that we can't build around in the product so we want to provide some user guidance for in the docs (and link to it from the product).

We'll be linking to the docs from the product (by next Tuesday) but as it's a short-link, we can point them to the general docs without it being a blocker.

@bmorelli25
Copy link
Member

We should be able to prioritize this. @mdbirnstiehl is booked up for the near future, but I'll try to find someone else on the team to take this on.

@bmorelli25
Copy link
Member

Note to writer: This document also needs to be ported to stateful

@dedemorton
Copy link
Contributor

@cauemarcondes @kpatticha Can you provide a decode_json_fields processor config example that shows how to pull out the encoded log.level and surface it in the main document (being careful not to overwrite the existing log.level). Maybe a config that would work with the following example:

message:{"@timestamp":"2024-07-31T08:26:45Z","log.level":"info","message":"proxying API request to http://opbeans:3000"}

@dedemorton
Copy link
Contributor

@mdbirnstiehl Do you think this new content belongs in the topic about adding the service name or a new topic called something like "Add a log level to logs"? I'm leaning towards a separate topic because it seems like we are mixing things that logically don't belong together. If I add this info to the existing topic, I'll need to completely restructure it. I also wonder if folks who aren't user the new experience might still want to know how to add (or decode) the log level WDYT?

@cauemarcondes
Copy link
Contributor

Can you provide a decode_json_fields processor config example that shows how to pull out the encoded log.level and surface it in the main document (being careful not to overwrite the existing log.level). Maybe a config that would work with the following example:

Hi @dedemorton, I think it's best to ask the @elastic/obs-ux-logs-team team for an official guide on how to add the JSON processor for both the Auto-detect logs and metrics and Stream log files starting guide. @flash1293 This is about what we talked a couple of weeks ago when you helped me parsing the log messages.

@tonyghiani
Copy link

Can you provide a decode_json_fields processor config example that shows how to pull out the encoded log.level and surface it in the main document (being careful not to overwrite the existing log.level)

@dedemorton you can use the pre-installed logs@json-pipeline pipeline to parse JSON logs. This is installed by ES by default and takes care of all the steps for the parsing of a JSON-like message.

Please note that the strategy the pipeline follows after parsing is add_to_root_conflict_strategy: merge, which means existing parsed fields will be overwritten.

Here is how you can use it:

{
  "my-pipeline": {
    "processors": [
      {
        "pipeline": {
          "name": "logs@json-pipeline",
          "ignore_missing_pipeline": true
        }
      }
    ]
  }
}

And this is the whole definition of logs@json-pipeline:

{
  "logs@json-pipeline": {
    "processors": [
      {
        "rename": {
          "if": "ctx.message instanceof String && ctx.message.startsWith('{') && ctx.message.endsWith('}')",
          "field": "message",
          "target_field": "_tmp_json_message",
          "ignore_missing": true
        }
      },
      {
        "json": {
          "if": "ctx._tmp_json_message != null",
          "field": "_tmp_json_message",
          "add_to_root": true,
          "add_to_root_conflict_strategy": "merge",
          "allow_duplicate_keys": true,
          "on_failure": [
            {
              "rename": {
                "field": "_tmp_json_message",
                "target_field": "message",
                "ignore_missing": true
              }
            }
          ]
        }
      },
      {
        "dot_expander": {
          "if": "ctx._tmp_json_message != null",
          "field": "*",
          "override": true
        }
      },
      {
        "remove": {
          "field": "_tmp_json_message",
          "ignore_missing": true
        }
      }
    ],
    "_meta": {
      "description": "automatic parsing of JSON log messages",
      "managed": true
    },
    "version": 12,
    "deprecated": false
  }
}

@mdbirnstiehl
Copy link
Contributor

@mdbirnstiehl Do you think this new content belongs in the topic about adding the service name or a new topic called something like "Add a log level to logs"? I'm leaning towards a separate topic because it seems like we are mixing things that logically don't belong together. If I add this info to the existing topic, I'll need to completely restructure it. I also wonder if folks who aren't user the new experience might still want to know how to add (or decode) the log level WDYT?

Yeah, I agree that it makes more sense to me as a separate topic.

@dedemorton
Copy link
Contributor

Spoke with @mdbirnstiehl. He is going to take over this issue because he has time now. Plus, as our logs guy™, he knows more about this subject. Thanks Mike!

@mdbirnstiehl
Copy link
Contributor

Hi @roshan-elastic, I'm not sure I completely understand the scenario of having to declare a log level that don't contain log levels at all.

I understand when there is a log.level present, but it's not parsed and we can use the logs@json pipeline described by @tonyghiani to parse the logs.

With the standalone agent link, wouldn't that just apply to the Agent's logs and not to logs from events that are getting indexed? Are we wanting to create arbitrary log levels for logs that don't contain log levels? I'm not sure if that would create meaningful data or graphs.

@roshan-elastic
Copy link
Contributor Author

Hey @mdbirnstiehl,

Good timing :)

We're actually going to change:

Current
Log Rate : (currently requires logs to have log.level to be included
Log Error % : (currently requires logs to have log.level to be included )

Change to

Log Rate : count() / [PERIOD_IN_MINUTES]
Log Error Rate : count(kql='log.level: "error" OR log.level: "ERROR"') / [PERIOD_IN_MINUTES]

@iblancof might be the best contact for this but we'll be making these changes as part of this epic:

For example, here

@mdbirnstiehl
Copy link
Contributor

Hi @roshan-elastic and @iblancof ,

Would it make sense to add a note or section in the new experience page about the Log Rate and Log Error Rate formulas and the need to parse the log.level for the Log Error Rate? We could then point to some examples like the page on extracting log.level from an unstructured or semi-structured log data, or extracting a log level from k8s logs?

I think it might fit better on the new experiences page rather than the service.name page. WDYT?

@roshan-elastic
Copy link
Contributor Author

Hey @mdbirnstiehl - thanks for reminding me about this.

we're actually about to change how we handle the log rate and log error %:

Log Rate

Now
Log Rate
count(kql='log.level: *') / [PERIOD_IN_MINUTES]

Changing to
Log Rate
count() / [PERIOD_IN_MINUTES]

Log Error %

Now
Log Error %
count(kql='log.level: "error" OR log.level: "ERROR"') / [PERIOD_IN_MINUTES]

Changing to
Log Error Rate
count(kql='log.level: "error" OR log.level: "ERROR" OR error.log.level : "error"') / [PERIOD_IN_MINUTES]

This means that the log charts in the service views should be far more unlikely to be empty:

Image

We could then point to some examples like the page on extracting log.level from an unstructured or semi-structured log data, or extracting a log level from k8s logs?

Makes sense!

I think it might fit better on the new experiences page rather than the service.name page. WDYT?

Yeah, I think this makes sense to show this in the new experiences page but perhaps it makes sense to link 'how' to do this in the service logs doc so that all of the information about how to declare your services well lives in one place?

In short:

  • Call out that UI won't work properly without declaring the service ==> experience page (and link through to service logs page)
  • How to declare data against your services ==> service logs page

WDYT?

@mdbirnstiehl
Copy link
Contributor

@roshan-elastic sounds good, I'll start making the updates!

@mdbirnstiehl mdbirnstiehl closed this as not planned Won't fix, can't repro, duplicate, stale Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants