-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agentx thread #16986
base: master
Are you sure you want to change the base?
Agentx thread #16986
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, avoid pulling some dependencies specific for bgpd.
lib/agentx.c
Outdated
@@ -23,25 +23,62 @@ | |||
#include "libfrr.h" | |||
#include "xref.h" | |||
#include "lib/libagentx.h" | |||
#include "ringbuf.h" /* for ringbuf_remain, ringbuf_peek, ringbuf_.. */ | |||
#include "frr_pthread.h" /* for struct frr_pthread */ | |||
#include "bgpd/bgpd.h" /* for bgp_pth_ax */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a a specific dependency with bgpd ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken into account
fddcf30
to
f6efd02
Compare
Also, could you fix |
7557c63
to
8535a1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a hard time following what problem is being solved here.
It sounds like there can be a problem if there are many traps to be sent - that the existing code is not detecting that, is not discarding events when it begins to overrun a receiver, and/or is not handling EWOULDBLOCK. Is that correct? wouldn't it be simpler to make the code detect and handle those conditions directly -rather than injecting a new pthread and then teaching it to handle the error conditions?
Is there any other problem present? I'm a bit terrified by this line from the description:
"Update the SNMP read operations to use mutex locks to ensure thread-safe execution."
so ... it is not possible to make reads of daemon data "thread-safe" - right? only the main pthread can access daemon internals. what change are you proposing to make there?
lib/agentx.c
Outdated
DEFINE_HOOK(agentx_enabled, (), ()); | ||
|
||
//bool agentx_enabled = false; | ||
|
||
static struct event_loop *agentx_tm; | ||
static struct event_loop *main_tm; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we name this something better? and at least put some documentation into why agentx code has too event_loop data structure pointers. Let's help our future selves out here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken ento account
8535a1b
to
43660e4
Compare
lib/agentx.c:438: error: log message contains tab or newline: "%s not enougth space in ibuf_ax needed : %lu free :\t %lu" (in smux_trap_multi_index()) needs to be fixed |
tomjstapp We can indeed address some cases by implementing flow control on the traps. Unfortunately, other cases related to the internal mechanisms of AgentX SNMP emerge. It's to provide a global solution, rather than a series of incomplete fixes, that we propose handling the sending of traps in a specific pthread. |
97b75e2
to
f03720f
Compare
So I'd like to understand that situation better. If the new pthread is wedged somewhere, and a daemon's main pthread is shutting down and waiting for it to stop, how will that work?
|
ci:rerun |
f03720f
to
57976b4
Compare
lib/agentx.c
Outdated
/* mutex dedicated to trap transfert between threads */ | ||
pthread_mutex_t ax_mtx; | ||
/* mutex dedicated to send/read exclusion */ | ||
pthread_mutex_t ax_io_mtx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't they be static ? both ax_mtx and ax_io_mtx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a minor update (static) that would be welcomed.
57976b4
to
b1f47b3
Compare
b1f47b3
to
1318f79
Compare
Thanks for the static that are being added. ;) |
sorry for not having notified the change. |
6519edd
to
48685e3
Compare
Please @fdumontet6WIND , do you have some data on the improvements you are getting ? LGTM |
added some comments following Donald sharp advice |
why closed ? |
need some improvments |
7f94aa3
to
256fc73
Compare
ci:rerun |
845a60d
to
1780159
Compare
ci:rerun |
1780159
to
b95cb12
Compare
@@ -83,7 +88,10 @@ static void agentx_timeout(struct event *t) | |||
snmp_timeout(); | |||
run_alarms(); | |||
} else { | |||
zlog_err("%s mutex already locked", __func__); | |||
if (nb_locktry_timeout_fail % 100 == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you bundle this fixup with e464755? Also, what should an operator do if I see such an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if an operator see a great number of such an error, restarting processes using agentx is good option
b95cb12
to
6124dae
Compare
ci:rerun |
AgentX is blocking the process in write/send for traps, if I understand correctly? If that's the problem, I'd rather have a buffer/queue without threads. Use |
Hi David, |
add "struct event *" parameter to agentx_event_update function. the function may be called via event_add*** functions. Signed-off-by: Francois Dumontet <[email protected]>
- Introduce a dedicated AgentX thread using `frr_pthread`. - Add mutex locks (`ax_mtx`, `ax_io_mtx`) to manage thread synchronization for trap transfer and I/O operations. - Implemented ring buffers (`ibuf_ax`) for handling "master -> AgentX" communication, improving data handling between threads. - Update the SNMP read operations to use mutex locks to ensure thread-safe execution. - Integrated a new dedicated thread to send SNMP traps, ensuring separation of responsibilities between the main and AgentX threads. - Enhanced trap handling to support multi-index traps, with excess traps being discarded if the buffer is full, preventing overflow. - Enhanced trap handling to support multi-index traps. When more than "RINGBUF_NB_TRAP" traps are pending for transmission, subsequent traps are discarded to prevent overflow. This update significantly improves concurrency, synchronization, and trap management within the AgentX module, with added protection against socket's buffer overflow from excessive traps. The socket's buffer overflow is leading to process deadlock. Signed-off-by: Francois Dumontet <[email protected]>
da24281
to
c7f12bf
Compare
ci:rerun |
1 similar comment
ci:rerun |
Introduce a dedicated AgentX thread using
frr_pthread
.Add mutex locks (
ax_mtx
,ax_io_mtx
) to manage threadsynchronization for trap transfer and I/O operations.
Implemented ring buffers (
ibuf_ax
) for handling"master -> AgentX" communication, improving data handling
between threads.
Update the SNMP read operations to use mutex locks to
ensure thread-safe execution.
Integrated a new dedicated thread to send SNMP traps,
ensuring separation of responsibilities between the main
and AgentX threads.
Enhanced trap handling to support multi-index traps, with
excess traps being discarded if the buffer is full,
preventing overflow.
Enhanced trap handling to support multi-index traps.
When more than "RINGBUF_NB_TRAP" traps are pending for
transmission, subsequent traps are discarded to prevent
overflow.
This update significantly improves concurrency,
synchronization, and trap management within the AgentX
module, with added protection against socket's buffer overflow
from excessive traps. The socket's buffer overflow is leading
to process deadlock.