Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log: add caller plugin-id about emitting error events #4632

Conversation

daipom
Copy link
Contributor

@daipom daipom commented Sep 11, 2024

Which issue(s) this PR fixes:

What this PR does / why we need it:
Add caller plugin-id to warning logs about emitting error events.

It would be helpful if we could know what plugin emitted the error event.

We need to care about the compatibility.
This signature change would not break compatibility.

However, I'm concerned that caller_plugin_id has a race condition, although I don't confirm it.
It looks to me that the id can be another plugin-id running concurrently...
It is not the issue with this fix, it is the issue of the existing implementation.

Docs Changes:
Not needed.

Release Note:
Add plugin-id to warning logs about emitting error events.

Example:
From #4567.

<source>
  @type sample
  @id id_sample_not_json
  sample {"data":"this is not json}"}
  tag sample.json
</source>

<filter sample.json>
  @type parser
  @id id_not_json
  key_name data
  <parse>
    @type json
  </parse>
</filter>

<match sample.**>
  @type stdout
</match>
2024-09-11 15:26:26 +0900 [info]: #0 fluentd worker is now running worker=0
2024-09-11 15:26:27 +0900 [warn]: #0 dump an error event:
 error_class=Fluent::Plugin::Parser::ParserError
 error="pattern not matched with data 'this is not json}'"
 location=nil tag="sample.json"
 time=2024-09-11 15:26:27.040657722 +0900
 plugin_id="id_not_json" ### <= This info is added by this PR ###
 record={"data"=>"this is not json}"}
...

@daipom
Copy link
Contributor Author

daipom commented Sep 11, 2024

However, I'm concerned that caller_plugin_id has a race condition, although I don't confirm it.
It looks to me that the id can be another plugin-id running concurrently...

def router
@_event_emitter_used_actually = true
if @_event_emitter_lazy_init
@router = @primary_instance.router
end
if @router.respond_to?(:caller_plugin_id=)
@router.caller_plugin_id = self.plugin_id
end
@router
end

@kenhys
Copy link
Contributor

kenhys commented Sep 13, 2024

As error.backtrace does not have enough information in this context, so current approach seems reasonable.
(No need to force 3rdparty plugins to fix it)

Copy link

This PR has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this PR will be closed in 7 days

@github-actions github-actions bot added the stale label Oct 13, 2024
@daipom daipom removed the stale label Oct 15, 2024
@daipom daipom added this to the v1.18.0 milestone Nov 1, 2024
It would be helpful if we could know what plugin emitted the
error event.

fluent#4567

We need to care about the compatibility.
This signature change would not break compatibility.

However, I'm concerned that `caller_plugin_id` has a race
condition, although I don't confirm it.
It looks to me that the id can be another plugin-id running
concurrently...
It is not the issue with this fix, it is the issue of the
existing implementation.

Signed-off-by: Daijiro Fukuda <[email protected]>
@daipom daipom force-pushed the log-add-caller-plugin-id-about-emitting-error-event branch from 2d4f379 to df33064 Compare November 27, 2024 04:38
@daipom
Copy link
Contributor Author

daipom commented Nov 27, 2024

A simple check seems ok.
But may occur with more load.

<source>
  @type sample
  tag test.foo
</source>
<source>
  @type sample
  tag test.bar
</source>
<source>
  @type sample
  tag test.boo
</source>

<filter test.foo>
  @type parser
  @id foo
  key_name not-exist
  <parse>
    @type none
  </parse>
</filter>
<filter test.bar>
  @type parser
  @id bar
  key_name not-exist
  <parse>
    @type none
  </parse>
</filter>
<filter test.boo>
  @type parser
  @id boo
  key_name not-exist
  <parse>
    @type none
  </parse>
</filter>

<match test.**>
  @type stdout
</match>
2024-11-27 13:55:58 +0900 [warn]: #0 dump an error event: error_class=ArgumentError error="not-exist does not exist" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_parser.rb:66:in `block in filter_stream'" tag="test.bar" time=2024-11-27 13:55:58.022326275 +0900 plugin_id="bar" record={"message"=>"sample"}
2024-11-27 13:55:58 +0900 [warn]: #0 dump an error event: error_class=ArgumentError error="not-exist does not exist" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_parser.rb:66:in `block in filter_stream'" tag="test.foo" time=2024-11-27 13:55:58.023002806 +0900 plugin_id="foo" record={"message"=>"sample"}
2024-11-27 13:55:58 +0900 [warn]: #0 dump an error event: error_class=ArgumentError error="not-exist does not exist" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_parser.rb:66:in `block in filter_stream'" tag="test.boo" time=2024-11-27 13:55:58.023570092 +0900 plugin_id="boo" record={"message"=>"sample"}
...

@daipom
Copy link
Contributor Author

daipom commented Nov 27, 2024

I've succeeded in reproducing the race condition.

  • Run 3 in_samples with rate 100 (100 emits/s) in concurrent.
  • Make each filter plugin raise an error.
  • Check Fluentd log. Check for incorrect tag/id combinations.

Results: 0.005% mismatch confirmed.

<source>
  @type sample
  tag test.foo
  rate 100
</source>
<source>
  @type sample
  tag test.bar
  rate 100
</source>
<source>
  @type sample
  tag test.boo
  rate 100
</source>

<filter test.foo>
  @type parser
  @id foo
  key_name not-exist
  <parse>
    @type none
  </parse>
</filter>
<filter test.bar>
  @type parser
  @id bar
  key_name not-exist
  <parse>
    @type none
  </parse>
</filter>
<filter test.boo>
  @type parser
  @id boo
  key_name not-exist
  <parse>
    @type none
  </parse>
</filter>

<match test.**>
  @type stdout
</match>

Script for check (Ruby)

#!/usr/bin/env ruby
entire_count = 0
mismatched_count = 0

p "checking files ..."

ARGF.each_line do |line|
  next unless /^.* dump an error event: .* tag="(.*)" .* plugin_id="(.*)" .*$/ =~ line
  entire_count += 1
  tag = $1
  plugin_id = $2

  unless tag.include? plugin_id
    mismatched_count += 1
    p "mismatched! tag: #{tag}, plugin_id: #{plugin_id}, raw_log: #{line}"
  end
end

p "done"
p "entire_count: #{entire_count}"
p "mismatched_count: #{mismatched_count}"
p "mismatched_percentage: #{100.0 * mismatched_count / entire_count}%"
$ bundle exec fluentd -c test.conf --log log; ./check.rb < log; rm -f log
(Wait a moment and press Ctrl+C)
^C
"checking files ..."
"mismatched! tag: test.boo, plugin_id: bar, raw_log: 2024-11-27 18:14:22 +0900 [warn]: #0 dump an error event: error_class=ArgumentError error=\"not-exist does not exist\" location=\"/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_parser.rb:66:in `block in filter_stream'\" tag=\"test.boo\" time=2024-11-27 18:14:22.793045711 +0900 plugin_id=\"bar\" record={\"message\"=>\"sample\"}\n"
"mismatched! tag: test.bar, plugin_id: foo, raw_log: 2024-11-27 18:15:56 +0900 [warn]: #0 dump an error event: error_class=ArgumentError error=\"not-exist does not exist\" location=\"/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_parser.rb:66:in `block in filter_stream'\" tag=\"test.bar\" time=2024-11-27 18:15:56.735563552 +0900 plugin_id=\"foo\" record={\"message\"=>\"sample\"}\n"
"mismatched! tag: test.bar, plugin_id: foo, raw_log: 2024-11-27 18:17:02 +0900 [warn]: #0 dump an error event: error_class=ArgumentError error=\"not-exist does not exist\" location=\"/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_parser.rb:66:in `block in filter_stream'\" tag=\"test.bar\" time=2024-11-27 18:17:02.429503164 +0900 plugin_id=\"foo\" record={\"message\"=>\"sample\"}\n"
"mismatched! tag: test.bar, plugin_id: foo, raw_log: 2024-11-27 18:17:24 +0900 [warn]: #0 dump an error event: error_class=ArgumentError error=\"not-exist does not exist\" location=\"/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_parser.rb:66:in `block in filter_stream'\" tag=\"test.bar\" time=2024-11-27 18:17:24.963402900 +0900 plugin_id=\"foo\" record={\"message\"=>\"sample\"}\n"
"done"
"entire_count: 73260"
"mismatched_count: 4"
"mismatched_percentage: 0.00546000546000546%"

@daipom
Copy link
Contributor Author

daipom commented Nov 27, 2024

However, I'm concerned that caller_plugin_id has a race condition, although I don't confirm it.
It looks to me that the id can be another plugin-id running concurrently...
It is not the issue with this fix, it is the issue of the existing implementation.

This issue was confirmed.
We need to fix this race condition first.
So, I close this.

@daipom daipom closed this Nov 27, 2024
@daipom daipom removed this from the v1.18.0 milestone Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error event logs should include plugin id
2 participants