Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decode: opentelemetry: Fix a crash in fluent-bit #46

Merged
merged 1 commit into from
Dec 20, 2023

Conversation

srini38
Copy link
Contributor

@srini38 srini38 commented Dec 3, 2023

This patch fixes a crash in fluent-bit when ctr_decode_opentelemetry_create() tries to access the span status and when that span status is not present.

the patch has been tested with opentelemetry-cpp-1.12.0/example_otlp_http http://localhost:4318/v1/traces DEBUG=yes bin and fluent-bit 2.2.0

  • Example configuration file for the change
[INPUT]
        name opentelemetry
        listen 127.0.0.1
        port 4318
        successful_response_code 200

[OUTPUT]
        name stdout
        match *
  • Debug log output from testing the change

Before fix:

root@b4aa3e5d75b5:/source/opentelemetry-cpp-1.12.0/build/examples/otlp# ./example_otlp_http http://localhost:4318/v1/traces DEBUG=yes bin

[2023/12/02 11:25:16] [engine] caught signal (SIGSEGV)
[2023/12/02 11:25:16] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/opentelemetry.c:52] new TCP connection arrived FD=40
[2023/12/02 11:25:16] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/http_conn.c:89] read()=402 pre_len=0 now_len=402
#0  0x559e36b6196c      in  ctr_decode_opentelemetry_create() at lib/ctraces/src/ctr_decode_opentelemetry.c:574
#1  0x559e36841d39      in  process_payload_traces_proto() at plugins/in_opentelemetry/opentelemetry_prot.c:166
#2  0x559e36841fa8      in  process_payload_traces() at plugins/in_opentelemetry/opentelemetry_prot.c:234
#3  0x559e3684543c      in  opentelemetry_prot_handle() at plugins/in_opentelemetry/opentelemetry_prot.c:1644
#4  0x559e3683c73c      in  opentelemetry_conn_event() at plugins/in_opentelemetry/http_conn.c:99
#5  0x559e3661e79f      in  flb_engine_start() at src/flb_engine.c:1009
#6  0x559e365bccf2      in  flb_lib_worker() at src/flb_lib.c:638
#7  0x7f1a092a2ad9      in  ???() at ???:0
#8  0x7f1a093332e3      in  ???() at ???:0
#9  0xffffffffffffffff  in  ???() at ???:0
Aborted (core dumped)

Post fix:

[2023/12/02 14:50:23] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/opentelemetry.c:52] new TCP connection arrived FD=40
[2023/12/02 14:50:23] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/http_conn.c:89] read()=402 pre_len=0 now_len=402
[2023/12/02 14:50:23] [debug] [input chunk] update output instances with new chunk size diff=567, records=0, input=opentelemetry.0
[2023/12/02 14:50:23] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/http_conn.c:89] read()=402 pre_len=0 now_len=402
[2023/12/02 14:50:23] [debug] [input chunk] update output instances with new chunk size diff=567, records=0, input=opentelemetry.0
[2023/12/02 14:50:23] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/http_conn.c:89] read()=165 pre_len=0 now_len=165
[2023/12/02 14:50:23] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/http_conn.c:89] read()=237 pre_len=165 now_len=402
[2023/12/02 14:50:23] [debug] [input chunk] update output instances with new chunk size diff=567, records=0, input=opentelemetry.0
[2023/12/02 14:50:23] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/http_conn.c:89] read()=165 pre_len=0 now_len=165
[2023/12/02 14:50:23] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/http_conn.c:89] read()=232 pre_len=165 now_len=397
[2023/12/02 14:50:23] [debug] [input chunk] update output instances with new chunk size diff=556, records=0, input=opentelemetry.0
[2023/12/02 14:50:23] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/http_conn.c:84] fd=40 closed connection
[2023/12/02 14:50:24] [debug] [task] created task=0x7fabf8018480 id=0 OK
[2023/12/02 14:50:24] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
|-------------------- RESOURCE SPAN --------------------|
  resource:
     - attributes:
            - service.name: 'unknown_service'
            - telemetry.sdk.version: '1.12.0'
            - telemetry.sdk.name: 'opentelemetry'
            - telemetry.sdk.language: 'cpp'
     - dropped_attributes_count: 0
  schema_url:
  [scope_span]
    instrumentation scope:
        - name                    : foo_library
        - version                 : 1.12.0
        - dropped_attributes_count: 0
        - attributes:

    schema_url:
    [spans]
         [span 'f1']
             - trace_id                : e86248c61e028f03fde5e462bcae3fb1
             - span_id                 : 5593e2db2dd9c035
             - parent_span_id          : 9e1b28eb1506ce3c
             - kind                    : 1 (internal)
             - start_time              : 1701517823379482754
             - end_time                : 1701517823379486350
             - dropped_attributes_count: 0
             - dropped_events_count    : 0
             - status:
                 - code        : 0
             - attributes: none
             - events: none
             - [links]
[2023/12/02 14:50:24] [debug] [out flush] cb_destroy coro_id=0
[2023/12/02 14:50:24] [debug] [task] destroy task=0x7fabf8018480 (task_id=0)
  • Attached Valgrind output that shows no leaks or memory corruption was found
root@b4aa3e5d75b5:/source/fluent-bit-2.2.0/build/bin# valgrind --tool=memcheck --leak-check=full --track-origins=yes --show-leak-kinds=all ./fluent-bit -v -c ./flb.conf
==44134== Memcheck, a memory error detector
==44134== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==44134== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==44134== Command: ./fluent-bit -v -c ./flb.conf
==44134==
Fluent Bit v2.2.0
* Copyright (C) 2015-2023 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2023/12/02 15:21:46] [ info] Configuration:
[2023/12/02 15:21:46] [ info]  flush time     | 1.000000 seconds
[2023/12/02 15:21:46] [ info]  grace          | 5 seconds
[2023/12/02 15:21:46] [ info]  daemon         | 0
[2023/12/02 15:21:46] [ info] ___________
[2023/12/02 15:21:46] [ info]  inputs:
[2023/12/02 15:21:46] [ info]      opentelemetry
[2023/12/02 15:21:46] [ info] ___________
[2023/12/02 15:21:46] [ info]  filters:
[2023/12/02 15:21:46] [ info] ___________
[2023/12/02 15:21:46] [ info]  outputs:
[2023/12/02 15:21:46] [ info]      stdout.0
[2023/12/02 15:21:46] [ info] ___________
[2023/12/02 15:21:46] [ info]  collectors:
[2023/12/02 15:21:46] [ info] [fluent bit] version=2.2.0, commit=, pid=44134
[2023/12/02 15:21:46] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2023/12/02 15:21:46] [ info] [storage] ver=1.5.1, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2023/12/02 15:21:46] [ info] [output:stdout:stdout.0] worker #0 started
[2023/12/02 15:21:46] [ info] [cmetrics] version=0.6.4
[2023/12/02 15:21:46] [ info] [ctraces ] version=0.3.1
[2023/12/02 15:21:46] [ info] [input:opentelemetry:opentelemetry.0] initializing
[2023/12/02 15:21:46] [ info] [input:opentelemetry:opentelemetry.0] storage_strategy='memory' (memory only)
[2023/12/02 15:21:46] [debug] [opentelemetry:opentelemetry.0] created event channels: read=21 write=22
[2023/12/02 15:21:46] [debug] [downstream] listening on 127.0.0.1:4318
[2023/12/02 15:21:46] [ info] [input:opentelemetry:opentelemetry.0] listening on 127.0.0.1:4318
[2023/12/02 15:21:46] [debug] [stdout:stdout.0] created event channels: read=24 write=25
[2023/12/02 15:21:46] [ info] [sp] stream processor started
[2023/12/02 15:21:52] [debug] [input chunk] update output instances with new chunk size diff=567, records=0, input=opentelemetry.0
[2023/12/02 15:21:52] [debug] [input chunk] update output instances with new chunk size diff=567, records=0, input=opentelemetry.0
[2023/12/02 15:21:52] [debug] [input chunk] update output instances with new chunk size diff=567, records=0, input=opentelemetry.0
[2023/12/02 15:21:52] [debug] [input chunk] update output instances with new chunk size diff=556, records=0, input=opentelemetry.0
[2023/12/02 15:21:53] [debug] [task] created task=0x526eb20 id=0 OK
|-------------------- RESOURCE SPAN --------------------|

[2023/12/02 15:21:53] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
  resource:
     - attributes:
            - service.name: 'unknown_service'
            - telemetry.sdk.version: '1.12.0'
            - telemetry.sdk.name: 'opentelemetry'
            - telemetry.sdk.language: 'cpp'
     - dropped_attributes_count: 0
  schema_url:
  [scope_span]
    instrumentation scope:
        - name                    : foo_library
        - version                 : 1.12.0
        - dropped_attributes_count: 0
        - attributes:

    schema_url:
    [spans]
         [span 'f1']
             - trace_id                : 4c9fd66d21d8655bac6dc161b3fb8365
             - span_id                 : da35b5ab06c17c82
             - parent_span_id          : 3261a69334388f15
             - kind                    : 1 (internal)
             - start_time              : 1701519712566966527
             - end_time                : 1701519712566969984
             - dropped_attributes_count: 0
             - dropped_events_count    : 0
             - status:
                 - code        : 0
             - attributes: none
             - events: none
             - [links]
[2023/12/02 15:21:53] [debug] [out flush] cb_destroy coro_id=0
[2023/12/02 15:21:53] [debug] [task] destroy task=0x526eb20 (task_id=0)
^C[2023/12/02 15:21:55] [engine] caught signal (SIGINT)
[2023/12/02 15:21:55] [ warn] [engine] service will shutdown in max 5 seconds
[2023/12/02 15:21:56] [ info] [engine] service has stopped (0 pending tasks)
[2023/12/02 15:21:56] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2023/12/02 15:21:56] [ info] [output:stdout:stdout.0] thread worker #0 stopped
==44134==
==44134== HEAP SUMMARY:
==44134==     in use at exit: 0 bytes in 0 blocks
==44134==   total heap usage: 1,999 allocs, 1,999 frees, 1,354,369 bytes allocated
==44134==
==44134== All heap blocks were freed -- no leaks are possible
==44134==
==44134== For lists of detected and suppressed errors, rerun with: -s
==44134== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

This patch fixes a crash in fluent-bit when ctr_decode_opentelemetry_create()
tries to access the span status and when that span status is not present.

the patch has been tested with opentelemetry-cpp-1.12.0/example_otlp_http
http://localhost:4318/v1/traces DEBUG=yes bin and fluent-bit 2.2.0

Signed-off-by: Srinivasan J <[email protected]>
srini38 added a commit to srini38/fluent-bit-out-stdout-fix that referenced this pull request Dec 11, 2023
This patch fixes print_traces_text() function to print all ctrace contexts

the patch has been tested with opentelemetry-cpp-1.12.0/example_otlp_http
http://localhost:4318/v1/traces DEBUG=yes bin, fluent-bit 2.2.0 and ctrace fix
(fluent/ctraces#46)

Signed-off-by: Srinivasan J <[email protected]>
@edsiper edsiper merged commit ff499bf into fluent:master Dec 20, 2023
19 of 20 checks passed
@edsiper
Copy link
Member

edsiper commented Dec 20, 2023

@srini38 thank you!

edsiper pushed a commit to fluent/fluent-bit that referenced this pull request Dec 20, 2023
This patch fixes print_traces_text() function to print all ctrace contexts

the patch has been tested with opentelemetry-cpp-1.12.0/example_otlp_http
http://localhost:4318/v1/traces DEBUG=yes bin, fluent-bit 2.2.0 and ctrace fix
(fluent/ctraces#46)

Signed-off-by: Srinivasan J <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants