Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BMP rework - 2 BMP fixes + some rework #17307

Closed

Conversation

pguibert6WIND
Copy link
Member

A separate pull request will be put in place to add the following in BMP:

  • ability to import BMP information from a separate BGP instance.
    This will be useful for L3VPN setups.

  • Some commits prepare the code for to this feature. It includes the ability to share the bgpbmp structure for various usages

  • A commit to re-send post-policy BGP updates when nexthop reachability is changed.

  • A commit related to an ASAN issue encountered with rx_open and tx_open buffers.

@pguibert6WIND pguibert6WIND force-pushed the bmp_rework_plus_fix branch 2 times, most recently from 20d62ea to e8eb315 Compare October 30, 2024 21:38
bgpd/bgp_bmp.c Outdated Show resolved Hide resolved
bgpd/bgp_bmp.c Show resolved Hide resolved
When a BGP listener configured with BMP receives the first BGP
IPv6 update from a connected BGP IPv6 peer, the BMP collector
receives a withdraw post-policy message.

> {"peer_type": "route distinguisher instance", "policy": "post-policy",
> "ipv6": true, "peer_ip": "192:167::3", "peer_distinguisher": "444:1",
> "peer_asn": 65501, "peer_bgp_id": "192.168.1.3", "timestamp":
> "2024-10-29 11:44:47.111962", "bmp_log_type": "withdraw", "afi": 2,
> "safi": 1, "ip_prefix": "2001::1125/128", "seq": 22}
> {"peer_type": "route distinguisher instance", "policy": "pre-policy",
> "ipv6": true, "peer_ip": "192:167::3", "peer_distinguisher": "444:1",
> "peer_asn": 65501, "peer_bgp_id": "192.168.1.3", "timestamp":
> "2024-10-29 11:44:47.111963", "bmp_log_type": "update", "origin":
> "IGP", "as_path": "", "afi": 2, "safi": 1, "nxhp_ip": "192:167::3",
> "nxhp_link-local": "fe80::7063:d8ff:fedb:9e11", "ip_prefix": "2001::1125/128", "seq": 23}

Actually, the BGP update is not valid, and BMP considers it as a
withdraw message. The BGP upate is not valid, because the nexthop
reachability is unknown at the time of reception, and no other
BMP message is sent.

Fix this by re-sending a BMP post update message when nexthop
tracking becomes successfull. Generalise the re-sending of
messages when nexthop tracking changes.

Signed-off-by: Philippe Guibert <[email protected]>
There is no need to create the bgp bmp instance, while there
is no bmp target to send data to.
Rewrite the code by using bgp_bmp_find() API instead.

Signed-off-by: Philippe Guibert <[email protected]>
The bgp_bmp_get() call is not necessary if the buffer
size limit of bmp is the default value. Reversely, the
no form of the config command should not allocate an
unnecessary bgpbmp structure.

Fix this by controlling the passed value before doing the
allocation.

Signed-off-by: Philippe Guibert <[email protected]>
The bmpbgp structure is used to store the mirror limit
and the bmp target config of a given bgp instance (lets say X).

It is wishable to share the bmpbgp structure with a
separate BGP instance (lets say Y) that wants to import
the BMP information from X.

In the case where Y is deleted, the reference to the bmpbgp
structure of X can be deleted, but not the bmpbgp structure of X.

Fix this by using a reference counter that will be used for each
usage: mirror limit, bmp target list, and later vrf importation.

Signed-off-by: Philippe Guibert <[email protected]>
The following ASAN error can be seen.

> ERROR: AddressSanitizer: attempting to call malloc_usable_size() for pointer which is not owned: 0x608000036c20
>     #0 0x7f3d7a4b5425 in __interceptor_malloc_usable_size ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:198
>     FRRouting#1 0x7f3d7a426a16 in __sanitizer::BufferedStackTrace::Unwind(unsigned long, unsigned long, void*, bool, unsigned int) ../../../../src/libsanitizer/sanitizer_common
> /sanitizer_stacktrace.h:122
>     FRRouting#2 0x7f3d7a426a16 in __asan::asan_malloc_usable_size(void const*, unsigned long, unsigned long) ../../../../src/libsanitizer/asan/asan_allocator.cpp:1074
>     FRRouting#3 0x7f3d7a03f330 in mt_count_free lib/memory.c:78
>     FRRouting#4 0x7f3d7a03f330 in qfree lib/memory.c:130
>     FRRouting#5 0x7f3d76ccf89b in bmp_peer_status_changed bgpd/bgp_bmp.c:982
>     FRRouting#6 0x560ae2aa6a94 in hook_call_peer_status_changed bgpd/bgp_fsm.c:47
>     FRRouting#7 0x560ae2aa6a94 in bgp_fsm_change_status bgpd/bgp_fsm.c:1287
>     FRRouting#8 0x560ae2c4f2e5 in peer_delete bgpd/bgpd.c:2777
>     FRRouting#9 0x560ae2c58d24 in bgp_delete bgpd/bgpd.c:4140
>     FRRouting#10 0x560ae2bbb47e in no_router_bgp bgpd/bgp_vty.c:1764
>     FRRouting#11 0x7f3d79fb74ed in cmd_execute_command_real lib/command.c:1003
>     FRRouting#12 0x7f3d79fb78a3 in cmd_execute_command lib/command.c:1062
>     FRRouting#13 0x7f3d79fb7e03 in cmd_execute lib/command.c:1228
>     FRRouting#14 0x7f3d7a107b53 in vty_command lib/vty.c:625
>     FRRouting#15 0x7f3d7a109902 in vty_execute lib/vty.c:1388
>     FRRouting#16 0x7f3d7a10cc32 in vtysh_read lib/vty.c:2400
>     FRRouting#17 0x7f3d7a0f848b in event_call lib/event.c:2019
>     FRRouting#18 0x7f3d7a01e627 in frr_run lib/libfrr.c:1232
>     FRRouting#19 0x560ae29e0037 in main bgpd/bgp_main.c:555
>     FRRouting#20 0x7f3d79a29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f3d79a29e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x560ae29e4ef4 in _start (/usr/lib/frr/bgpd+0x2eeef4)
>
> 0x608000036c20 is located 0 bytes inside of 81-byte region [0x608000036c20,0x608000036c71)
> freed by thread T0 here:
>     #0 0x7f3d7a4b4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
>     FRRouting#1 0x7f3d76ccf85f in bmp_peer_status_changed bgpd/bgp_bmp.c:981
>     FRRouting#2 0x560ae2aa6a94 in hook_call_peer_status_changed bgpd/bgp_fsm.c:47
>     FRRouting#3 0x560ae2aa6a94 in bgp_fsm_change_status bgpd/bgp_fsm.c:1287
>     FRRouting#4 0x560ae2c4f2e5 in peer_delete bgpd/bgpd.c:2777
>     FRRouting#5 0x560ae2c58d24 in bgp_delete bgpd/bgpd.c:4140
>     FRRouting#6 0x560ae2bbb47e in no_router_bgp bgpd/bgp_vty.c:1764
>     FRRouting#7 0x7f3d79fb74ed in cmd_execute_command_real lib/command.c:1003
>     FRRouting#8 0x7f3d79fb78a3 in cmd_execute_command lib/command.c:1062
>     FRRouting#9 0x7f3d79fb7e03 in cmd_execute lib/command.c:1228
>     FRRouting#10 0x7f3d7a107b53 in vty_command lib/vty.c:625
>     FRRouting#11 0x7f3d7a109902 in vty_execute lib/vty.c:1388
>     FRRouting#12 0x7f3d7a10cc32 in vtysh_read lib/vty.c:2400
>     FRRouting#13 0x7f3d7a0f848b in event_call lib/event.c:2019
>     FRRouting#14 0x7f3d7a01e627 in frr_run lib/libfrr.c:1232
>     FRRouting#15 0x560ae29e0037 in main bgpd/bgp_main.c:555
>     FRRouting#16 0x7f3d79a29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> previously allocated by thread T0 here:
>     #0 0x7f3d7a4b4887 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
>     FRRouting#1 0x7f3d7a03f0e9 in qmalloc lib/memory.c:101
>     FRRouting#2 0x7f3d76cd0166 in bmp_bgp_peer_vrf bgpd/bgp_bmp.c:2194
>     FRRouting#3 0x7f3d76cd0166 in bmp_bgp_update_vrf_status bgpd/bgp_bmp.c:2236
>     FRRouting#4 0x7f3d76cd29b8 in bmp_vrf_state_changed bgpd/bgp_bmp.c:3479
>     FRRouting#5 0x560ae2c45b34 in hook_call_bgp_instance_state bgpd/bgpd.c:88
>     FRRouting#6 0x560ae2c4d158 in bgp_instance_up bgpd/bgpd.c:3936
>     FRRouting#7 0x560ae29e5ed1 in bgp_vrf_enable bgpd/bgp_main.c:299
>     FRRouting#8 0x7f3d7a0ff8b1 in vrf_enable lib/vrf.c:286
>     FRRouting#9 0x7f3d7a0ff8b1 in vrf_enable lib/vrf.c:275
>     FRRouting#10 0x7f3d7a12ab66 in zclient_vrf_add lib/zclient.c:2561
>     FRRouting#11 0x7f3d7a12eb43 in zclient_read lib/zclient.c:4624
>     FRRouting#12 0x7f3d7a0f848b in event_call lib/event.c:2019
>     FRRouting#13 0x7f3d7a01e627 in frr_run lib/libfrr.c:1232
>     FRRouting#14 0x560ae29e0037 in main bgpd/bgp_main.c:555
>     FRRouting#15 0x7f3d79a29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

Signed-off-by: Philippe Guibert <[email protected]>
Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so just making the general comment that we have historically done a poor job with refcounting. I think it would be much better to get an understanding of the goal, the proposal that you think requires refcounting, before taking such a change.

if (bmpbgp->mirror_qsizelimit == ~0UL)
return;
bmpbgp->mirror_qsizelimit = ~0UL;
bmp_bgp_put(bmpbgp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so ... won't this deref?

Copy link
Member Author

@pguibert6WIND pguibert6WIND Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have a better design pattern to apply to my use case, you are welcome:
I can try to apply it.

For the reason why I do this:

It is wishable to share the bmpbgp structure with a
separate BGP instance (lets say Y) that wants to import
the BMP information from X.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may be a reasonable question, I guess, but it's not a response to the review comment? The review comment is "this code is wrong".
The irony is that the general comment was "this implementation choice is hard to get right", and ... in fact it's not right, even in this first effort.

Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, waiting still on Mark's comment.

@mjstapp
Copy link
Contributor

mjstapp commented Nov 4, 2024

So some ideas about this:

For the reason why I do this:

It is wishable to share the bmpbgp structure with a
separate BGP instance (lets say Y) that wants to import
the BMP information from X.

I'm not sure I understand what "share the bmp structure" means. Do you want to share configuration - from parent to children, that sort of thing? or do you want to use a single bmp connection to multiplex updates from multiple bgp instances?

One choice would be to have X know about the set of its children or dependents {Y, A, B, ...} and push changes to those objects.
Another option might be to have the bmp instances in {Y, A, B, ...} refer to their parent in some other way - not be a pointer to their parent, but be able to locate their parent.

@pguibert6WIND
Copy link
Member Author

pguibert6WIND commented Nov 4, 2024

So some ideas about this:

For the reason why I do this:

It is wishable to share the bmpbgp structure with a
separate BGP instance (lets say Y) that wants to import
the BMP information from X.

I'm not sure I understand what "share the bmp structure" means. Do you want to share configuration - from parent to children, that sort of thing? or do you want to use a single bmp connection to multiplex updates from multiple bgp instances?

Case one single BMP connection to multiplex updates from multiple bgp instances.

One choice would be to have X know about the set of its children or dependents {Y, A, B, ...} and push changes to those objects.

Not this choice.
The following vty command would be configured on BMP from Y.

router bgp 65500 vrf X
exit
router bgp 65500
  bmp target TGT1
   import-vrf-view vrf1
 exit
exit

Another option might be to have the bmp instances in {Y, A, B, ...}
refer to their parent in some other way - not be a pointer to their parent,
but be able to locate their parent.

Even if BMP from X is not configured, the instance Y needs to get access from it.
Reason to have this bgpbmp structure :

  • it indicates the presence of a BGP instance.
  • it stores the vrf presence (needed for loc-rib presence)

Either I reuse it, or I duplicate vrf_state in the imported list structure.

@pguibert6WIND
Copy link
Member Author

I remove the lock idea.. closing this PR, creating new one: #17373

@ton31337
Copy link
Member

ton31337 commented Nov 7, 2024

Why not update this PR just?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants