dplane failed to limit the maximum length of the queue #15016

zice312963205 · 2023-12-14T01:57:25Z

Describe the bug

FRR version: 8.2.2
kernel version:Linux 5.10

When I learn 2 million routes from BGP neighbors in a short period of time, dplane consumes a large amount of cache.

It seems there is an issue where if a large number of routes are learned in a short period of time, dplane will occupy a substantial amount of cache memory, which might lead to an out-of-memory (OOM) situation.

Did you check if this is a duplicate issue?
Did you test it on the latest FRRouting/frr master branch?

To Reproduce

Expected behavior

Screenshots

Versions

OS Version:

Kernel:

FRR Version:

Additional context

donaldsharp · 2023-12-14T11:27:59Z

Can we see the cli you have for the command line of zebra?

donaldsharp · 2023-12-14T13:00:27Z

I'd like to see a show thread cpu as well

zice312963205 · 2023-12-14T13:11:44Z

I'd like to see a show thread cpu as well

donaldsharp · 2023-12-14T14:02:43Z

Why didn't you include the entirety of the show thread cpu output?

In any event I was able to recreate something similiar in my home setup. I am not sure if this is what you are reporting, but it probaby is, can you give this a try: #15025
and see if it cleans the problem up

zice312963205 · 2023-12-19T09:01:17Z

Why didn't you include the entirety of the show thread cpu output?

In any event I was able to recreate something similiar in my home setup. I am not sure if this is what you are reporting, but it probaby is, can you give this a try: #15025 and see if it cleans the problem up

I tracked the code flow of ctx and found that the problem arises because after ctx is processed by the provider, it is all hung on the rib_dplane_q for caching. Then, the value of zdplane_info.dg_routes_queued will be reduced, which leads to the failure of the attempt to limit the number of ctx processed each time (200) in the function meta_queue_process. Since rib_process_dplane_results is executed in the main thread of zebra, scheduling will be relatively slow. Therefore, when a large number of routes are injected in a short time, a lot of temporary caches will be hung on the rib_dplane_q, which in turn causes ctx to not be released in time, leading to this problem.

I have an idea for modification, which is to attempt to judge the length of rib_dplane_q in the function meta_queue_process. If there are already many cached nodes, then return WQ_QUEUE_BLOCKED to temporarily delay the processing of rib_process.

github-actions · 2024-06-17T01:49:44Z

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.

zice312963205 added the triage Needs further investigation label Dec 14, 2023

zice312963205 mentioned this issue Dec 25, 2023

zebra: Temporarily block the execution of the rib_process function while the thread t_dplane is waiting to be scheduled. #15065

Open

This was referenced May 22, 2024

zebra: fix an issue: dplane failed to limit the maximum length of the… #16067

Open

Fpmsyncd Next Hop Table Enhancement HLD sonic-net/SONiC#1425

Merged

github-actions bot added the autoclose label Jun 17, 2024

github-actions bot closed this as completed Dec 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dplane failed to limit the maximum length of the queue #15016

dplane failed to limit the maximum length of the queue #15016

zice312963205 commented Dec 14, 2023 •

edited

Loading

donaldsharp commented Dec 14, 2023

donaldsharp commented Dec 14, 2023

zice312963205 commented Dec 14, 2023

donaldsharp commented Dec 14, 2023

zice312963205 commented Dec 19, 2023

github-actions bot commented Jun 17, 2024

dplane failed to limit the maximum length of the queue #15016

dplane failed to limit the maximum length of the queue #15016

Comments

zice312963205 commented Dec 14, 2023 • edited Loading

donaldsharp commented Dec 14, 2023

donaldsharp commented Dec 14, 2023

zice312963205 commented Dec 14, 2023

donaldsharp commented Dec 14, 2023

zice312963205 commented Dec 19, 2023

github-actions bot commented Jun 17, 2024

zice312963205 commented Dec 14, 2023 •

edited

Loading