From c47b2460fc33545a25fd344567123c4c16e4f929 Mon Sep 17 00:00:00 2001 From: ChrsMark Date: Wed, 15 May 2024 11:57:03 +0300 Subject: [PATCH 1/3] add container parser blogpost Signed-off-by: ChrsMark --- .../index.md | 198 ++++++++++++++++++ static/refcache.json | 32 +++ 2 files changed, 230 insertions(+) create mode 100644 content/en/blog/2024/otel-collector-container-log-parser/index.md diff --git a/content/en/blog/2024/otel-collector-container-log-parser/index.md b/content/en/blog/2024/otel-collector-container-log-parser/index.md new file mode 100644 index 000000000000..0ef8693654ea --- /dev/null +++ b/content/en/blog/2024/otel-collector-container-log-parser/index.md @@ -0,0 +1,198 @@ +--- +title: Introducing the new container log parser for OpenTelemetry Collector +linkTitle: Collector container log parser +date: 2024-05-16 +author: '[Christos Markou](https://github.com/ChrsMark) (Elastic)' +cSpell:ignore: Christos containerd Filelog filelog Jaglowski kube Markou +--- + +[Filelog receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/filelogreceiver) +is one of the most commonly used components of the +[OpenTelemetry Collector](/docs/collector), as indicated by the most recent +[survey](/blog/2024/otel-collector-survey/#otel-components-usage). According to +the same survey, it's unsurprising that +[Kubernetes is the leading platform for Collector deployment (80.6%)](/blog/2024/otel-collector-survey/#deployment-scale-and-environment). +Based on these two facts, we can realize the importance of seamless log +collection on Kubernetes environments. + +Currently, the +[filelog receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.100.0/receiver/filelogreceiver/README.md) +is capable of parsing container logs from Kubernetes Pods, but it requires +[extensive configuration](https://github.com/open-telemetry/opentelemetry-helm-charts/blob/aaa70bde1bf8bf15fc411282468ac6d2d07f772d/charts/opentelemetry-collector/templates/_config.tpl#L206-L282) +to properly parse logs according to various container runtime formats. The +reason is that container logs can come in various known formats depending on the +container runtime, so we need to perform a specific set of operations in order +to properly parse them: + +1. Detect the format of the incoming logs at runtime. +2. Parse each format accordingly taking into account its format specific + characteristics. For example, define if it's JSON or plain text and take into + account the timestamp format. +3. Extract known metadata relying on predefined patterns. + +Such advanced sequence of operations can be handled by chaining the proper +[stanza](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/stanza) +operators together. The end result is rather complex, and this configuration +complexity can be mitigated by using the corresponding +[helm chart preset](https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector#configuration-for-kubernetes-container-logs). +However, despite having the preset, it can still be challenging for users to +maintain and troubleshoot such advanced configurations. + +The community has raised the issue of +[improving the Kubernetes Logs Collection Experience](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/25251) +in the past. One step towards achieving this would be to provide a simplified +and robust option for parsing container logs without the need for manual +specification or maintenance of the implementation details. With the proposal +and implementation of the new +[container parser](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959), +all these implementation details are encapsulated and handled within the +parser's implementation. Adding to this the ability to cover the implementation +with unit tests and various fail-over logic indicates a significant improvement +in container log parsing. + +## How container logs look like + +First of all we need to quickly recall the different container log formats that +we can meet out there: + +- Docker container logs: + + `{"log":"INFO: This is a docker log line","stream":"stdout","time":"2024-03-30T08:31:20.545192187Z"}` + +- cri-o logs: + + `2024-04-13T07:59:37.505201169-05:00 stdout F This is a cri-o log line!` + +- Containerd logs: + + `2024-04-22T10:27:25.813799277Z stdout F This is an awesome containerd log line!` + +We can notice that cri-o and containerd log formats are quite similar (both +follow the CRI logging format) but with a small difference in the timestamp +format. + +Consequently, in order to properly handle these 3 different formats we need 3 +different routes of +[stanza](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/stanza) +operators as we can see in the +[container parser operator issue](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959). + +In addition, the CRI format can provide partial logs which we would like to +combine them into one at first place: + +```text +2024-04-06T00:17:10.113242941Z stdout P This is a very very long line th +2024-04-06T00:17:10.113242941Z stdout P at is really really long and spa +2024-04-06T00:17:10.113242941Z stdout F ns across multiple log entries +``` + +Ideally, we want our parser to be capable of automatically detecting the format +at runtime and properly parse the log lines. We will see later that the +container parser will do that for us. + +## Attribute handling + +Container log files follow a specific naming pattern from which we can extract +useful metadata information during parsing. For example, from +`/var/log/pods/kube-system_kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler/1.log`, +we can extract the namespace, the name and UID of the pod, and the name of the +container. + +After extracting this metadata, we need to store it properly using the +appropriate attributes following the +[Semantic Conventions](/docs/specs/semconv/resource/k8s/). This handling can +also be encapsulated within the parser's implementation, eliminating the need +for users to define it manually. + +## Using the new container parser + +With all these in mind, the container parser can be configured like this: + +```yaml +receivers: + filelog: + include_file_path: true + include: + - /var/log/pods/*/*/*.log + operators: + - id: container-parser + type: container +``` + +That configuration is more than enough to properly parse the log line and +extract all the useful Kubernetes metadata. + +A log line +`{"log":"INFO: This is a docker log line","stream":"stdout","time":"2024-03-30T08:31:20.545192187Z"}` +that is written at +`/var/log/pods/kube-system_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log` +will produce a log entry like the following: + +```json +{ + "timestamp": "2024-03-30 08:31:20.545192187 +0000 UTC", + "body": "INFO: This is a docker log line", + "attributes": { + "time": "2024-03-30T08:31:20.545192187Z", + "log.iostream": "stdout", + "k8s.pod.name": "kube-controller-kind-control-plane", + "k8s.pod.uid": "49cc7c1fd3702c40b2686ea7486091d6", + "k8s.container.name": "kube-controller", + "k8s.container.restart_count": "1", + "k8s.namespace.name": "kube-system", + "log.file.path": "/var/log/pods/kube-system_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log" + } +} +``` + +We can notice that we don't have to define the format. The parser automatically +detects the format and parses the logs accordingly. Even partial logs that cri-o +or containerd runtimes can produce will be recombined properly without the need +of any special configuration. + +This is really handy, because as users we don't need to care about specifying +the format and even maintaining different configurations for different +environments. + +## Implementation details + +In order to implement that parser operator most of the code was written from +scratch, but we were able to re-use the recombine operator internally for the +partial logs parsing. In order to achieve this some small refactoring was +required but this gave us the opportunity to re-use an already existent and well +tested component. + +Additionally, during the discussions around the implementation of this feature, +a question popped up: _Why to implement this as an operator and not as a +processor?_ + +One basic reason is that the order of the log records arriving at processors is +not guaranteed. However we need to ensure this, so as to properly handle the +partial log parsing. That's why implementing it as an operator for now was the +way to go. Moreover, at the moment +[it is suggested](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/32080#issuecomment-2035301178) +to do as much work during the collection as possible and having robust parsing +capabilities allows that. + +More information about the implementation discussions can be found at the +respective +[GitHub issue](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959) +and its related/linked PR. + +Last but not least, we should mention that with the example of the specific +container parser we can notice the room for improvement that exists and how we +could optimize further for popular technologies with known log formats in the +future. + +## Conclusion: container logs parsing is now easier with filelog receiver + +Eager to learn more about the container parser? Visit the official +[documentation](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/docs/operators/container.md) +and if you give it a try let us know what you think. Don't hesitate to reach out +to us in the official CNCF [Slack workspace](https://slack.cncf.io/) and +specifically the `#otel-collector` channel. + +## Acknowledgements + +Kudos to [Daniel Jaglowski](https://github.com/djaglowski) for reviewing the +parser's implementation and providing valuable feedback! diff --git a/static/refcache.json b/static/refcache.json index 71f0a2d6bc83..1327f3603936 100644 --- a/static/refcache.json +++ b/static/refcache.json @@ -2155,6 +2155,10 @@ "StatusCode": 200, "LastSeen": "2024-03-15T20:34:22.210208944Z" }, + "https://github.com/ChrsMark": { + "StatusCode": 200, + "LastSeen": "2024-05-15T19:23:42.377730577+03:00" + }, "https://github.com/Cyprinus12138": { "StatusCode": 200, "LastSeen": "2024-03-28T22:25:37.072281206+08:00" @@ -2495,6 +2499,10 @@ "StatusCode": 200, "LastSeen": "2024-01-18T20:05:04.809604-05:00" }, + "https://github.com/djaglowski": { + "StatusCode": 200, + "LastSeen": "2024-05-15T19:23:48.979905025+03:00" + }, "https://github.com/dmathieu": { "StatusCode": 200, "LastSeen": "2024-02-14T08:44:43.625674121Z" @@ -3083,6 +3091,18 @@ "StatusCode": 200, "LastSeen": "2024-03-19T10:16:57.223070258Z" }, + "https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/25251": { + "StatusCode": 200, + "LastSeen": "2024-05-15T19:23:46.151980372+03:00" + }, + "https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959": { + "StatusCode": 200, + "LastSeen": "2024-05-15T19:23:47.587898763+03:00" + }, + "https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/32080#issuecomment-2035301178": { + "StatusCode": 200, + "LastSeen": "2024-05-15T19:23:48.560170178+03:00" + }, "https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/14132": { "StatusCode": 200, "LastSeen": "2024-01-18T20:06:09.209998-05:00" @@ -5931,6 +5951,14 @@ "StatusCode": 206, "LastSeen": "2024-05-06T07:53:28.679391-07:00" }, + "https://opentelemetry.io/blog/2024/otel-collector-survey/#deployment-scale-and-environment": { + "StatusCode": 206, + "LastSeen": "2024-05-15T19:23:44.74352841+03:00" + }, + "https://opentelemetry.io/blog/2024/otel-collector-survey/#otel-components-usage": { + "StatusCode": 206, + "LastSeen": "2024-05-15T19:23:44.426379755+03:00" + }, "https://opentelemetry.io/blog/2024/scaling-collectors/": { "StatusCode": 206, "LastSeen": "2024-05-06T07:53:28.903161-07:00" @@ -5987,6 +6015,10 @@ "StatusCode": 206, "LastSeen": "2024-02-24T14:33:05.630341-08:00" }, + "https://opentelemetry.io/docs/specs/semconv/resource/k8s/": { + "StatusCode": 206, + "LastSeen": "2024-05-15T19:23:47.920456821+03:00" + }, "https://opentelemetry.io/ecosystem/integrations/": { "StatusCode": 206, "LastSeen": "2024-03-19T10:16:49.992495889Z" From 9fc5d8af1efc86fd14f2fc70b937b9e75ede5f4b Mon Sep 17 00:00:00 2001 From: ChrsMark Date: Fri, 17 May 2024 16:51:47 +0300 Subject: [PATCH 2/3] switch to single voice Signed-off-by: ChrsMark --- .../index.md | 46 +++++++++---------- 1 file changed, 22 insertions(+), 24 deletions(-) diff --git a/content/en/blog/2024/otel-collector-container-log-parser/index.md b/content/en/blog/2024/otel-collector-container-log-parser/index.md index 0ef8693654ea..66e15d9bfdab 100644 --- a/content/en/blog/2024/otel-collector-container-log-parser/index.md +++ b/content/en/blog/2024/otel-collector-container-log-parser/index.md @@ -1,7 +1,7 @@ --- title: Introducing the new container log parser for OpenTelemetry Collector linkTitle: Collector container log parser -date: 2024-05-16 +date: 2024-05-17 author: '[Christos Markou](https://github.com/ChrsMark) (Elastic)' cSpell:ignore: Christos containerd Filelog filelog Jaglowski kube Markou --- @@ -21,7 +21,7 @@ is capable of parsing container logs from Kubernetes Pods, but it requires [extensive configuration](https://github.com/open-telemetry/opentelemetry-helm-charts/blob/aaa70bde1bf8bf15fc411282468ac6d2d07f772d/charts/opentelemetry-collector/templates/_config.tpl#L206-L282) to properly parse logs according to various container runtime formats. The reason is that container logs can come in various known formats depending on the -container runtime, so we need to perform a specific set of operations in order +container runtime, so you need to perform a specific set of operations in order to properly parse them: 1. Detect the format of the incoming logs at runtime. @@ -32,7 +32,7 @@ to properly parse them: Such advanced sequence of operations can be handled by chaining the proper [stanza](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/stanza) -operators together. The end result is rather complex, and this configuration +operators together. The end result is rather complex. This configuration complexity can be mitigated by using the corresponding [helm chart preset](https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector#configuration-for-kubernetes-container-logs). However, despite having the preset, it can still be challenging for users to @@ -52,8 +52,8 @@ in container log parsing. ## How container logs look like -First of all we need to quickly recall the different container log formats that -we can meet out there: +First of all let's quickly recall the different container log formats that can +be met out there: - Docker container logs: @@ -71,13 +71,12 @@ We can notice that cri-o and containerd log formats are quite similar (both follow the CRI logging format) but with a small difference in the timestamp format. -Consequently, in order to properly handle these 3 different formats we need 3 -different routes of +To properly handle these 3 different formats you need 3 different routes of [stanza](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/stanza) operators as we can see in the [container parser operator issue](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959). -In addition, the CRI format can provide partial logs which we would like to +In addition, the CRI format can provide partial logs which you would like to combine them into one at first place: ```text @@ -86,19 +85,19 @@ combine them into one at first place: 2024-04-06T00:17:10.113242941Z stdout F ns across multiple log entries ``` -Ideally, we want our parser to be capable of automatically detecting the format -at runtime and properly parse the log lines. We will see later that the +Ideally you would like our parser to be capable of automatically detecting the +format at runtime and properly parse the log lines. We will see later that the container parser will do that for us. ## Attribute handling -Container log files follow a specific naming pattern from which we can extract +Container log files follow a specific naming pattern from which you can extract useful metadata information during parsing. For example, from `/var/log/pods/kube-system_kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler/1.log`, -we can extract the namespace, the name and UID of the pod, and the name of the +you can extract the namespace, the name and UID of the pod, and the name of the container. -After extracting this metadata, we need to store it properly using the +After extracting this metadata, you need to store it properly using the appropriate attributes following the [Semantic Conventions](/docs/specs/semconv/resource/k8s/). This handling can also be encapsulated within the parser's implementation, eliminating the need @@ -145,12 +144,12 @@ will produce a log entry like the following: } ``` -We can notice that we don't have to define the format. The parser automatically -detects the format and parses the logs accordingly. Even partial logs that cri-o -or containerd runtimes can produce will be recombined properly without the need -of any special configuration. +You can notice that you don't have to define the format. The parser +automatically detects the format and parses the logs accordingly. Even partial +logs that cri-o or containerd runtimes can produce will be recombined properly +without the need of any special configuration. -This is really handy, because as users we don't need to care about specifying +This is really handy, because as a user you don't need to care about specifying the format and even maintaining different configurations for different environments. @@ -158,13 +157,12 @@ environments. In order to implement that parser operator most of the code was written from scratch, but we were able to re-use the recombine operator internally for the -partial logs parsing. In order to achieve this some small refactoring was -required but this gave us the opportunity to re-use an already existent and well -tested component. +partial logs parsing. To achieve this, some small refactoring was required but +this gave us the opportunity to re-use an already existent and well tested +component. -Additionally, during the discussions around the implementation of this feature, -a question popped up: _Why to implement this as an operator and not as a -processor?_ +During the discussions around the implementation of this feature, a question +popped up: _Why to implement this as an operator and not as a processor?_ One basic reason is that the order of the log records arriving at processors is not guaranteed. However we need to ensure this, so as to properly handle the From db56d910d3800f9ca693196c017d9db2bd653dda Mon Sep 17 00:00:00 2001 From: ChrsMark Date: Mon, 20 May 2024 10:51:19 +0300 Subject: [PATCH 3/3] add note about reduced configuration Signed-off-by: ChrsMark --- .../blog/2024/otel-collector-container-log-parser/index.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/content/en/blog/2024/otel-collector-container-log-parser/index.md b/content/en/blog/2024/otel-collector-container-log-parser/index.md index 66e15d9bfdab..305964d89d2e 100644 --- a/content/en/blog/2024/otel-collector-container-log-parser/index.md +++ b/content/en/blog/2024/otel-collector-container-log-parser/index.md @@ -1,7 +1,7 @@ --- title: Introducing the new container log parser for OpenTelemetry Collector linkTitle: Collector container log parser -date: 2024-05-17 +date: 2024-05-22 author: '[Christos Markou](https://github.com/ChrsMark) (Elastic)' cSpell:ignore: Christos containerd Filelog filelog Jaglowski kube Markou --- @@ -119,7 +119,10 @@ receivers: ``` That configuration is more than enough to properly parse the log line and -extract all the useful Kubernetes metadata. +extract all the useful Kubernetes metadata. It's quite obvious how much less +configuration is required now. Using a combination of operators would result in +about 69 lines of configuration as it was pointed out at the +[original proposal](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959). A log line `{"log":"INFO: This is a docker log line","stream":"stdout","time":"2024-03-30T08:31:20.545192187Z"}`