Skip to content

Commit

Permalink
bgpd: Add peers back to peer hash when peer_xfer_conn fails
Browse files Browse the repository at this point in the history
It was noticed that occassionally peering failed in a testbed
upon investigation it was found that the peer was not in the
peer hash and we saw these failure messages:

Aug 25 21:31:15 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: %NOTIFICATION: sent to neighbor 2001:cafe:1ead:4::4 4/0 (Hold Timer Expired) 0 bytes
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 100663299] Can't get remote address and port: Transport endpoint is not connected
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 100663299] %bgp_getsockname() failed for  peer 2001:cafe:1ead:4::4 fd 27 (from_peer fd -1)
Aug 25 21:31:22 doca-hbn-service-bf3-s06-1-ipmi bgpd[3048]: [EC 33554464] %Neighbor failed in xfer_conn

root@doca-hbn-service-bf3-s06-1-ipmi:/var/log/hbn/frr# vtysh -c 'show bgp peerhash' | grep 2001:cafe:1ead:4::4
root@doca-hbn-service-bf3-s06-1-ipmi:/var/log/hbn/frr#

Upon looking at the code the peer_xfer_conn function can fail
and the bgp_establish code will then return before adding the
peer back to the peerhash.

This is only part of the failure.  The peer also appears to
be in a state where it is no longer initiating connection attempts
but that will be another commited fix when we figure that one out.

Signed-off-by: Donald Sharp <[email protected]>
  • Loading branch information
donaldsharp committed Aug 31, 2023
1 parent 030d2f0 commit ce1f5d3
Showing 1 changed file with 12 additions and 0 deletions.
12 changes: 12 additions & 0 deletions bgpd/bgp_fsm.c
Original file line number Diff line number Diff line change
Expand Up @@ -2137,6 +2137,7 @@ bgp_establish(struct peer_connection *connection)
struct peer *other;
int status;
struct peer *peer = connection->peer;
struct peer *orig = peer;

other = peer->doppelganger;
hash_release(peer->bgp->peerhash, peer);
Expand All @@ -2146,6 +2147,17 @@ bgp_establish(struct peer_connection *connection)
peer = peer_xfer_conn(peer);
if (!peer) {
flog_err(EC_BGP_CONNECT, "%%Neighbor failed in xfer_conn");

/*
* A failure of peer_xfer_conn but not putting the peers
* back in the hash ends up with a situation where incoming
* connections are rejected, as that the peer is not found
* when a lookup is done
*/
(void)hash_get(orig->bgp->peerhash, orig, hash_alloc_intern);
if (other)
(void)hash_get(other->bgp->peerhash, other,
hash_alloc_intern);
return BGP_FSM_FAILURE;
}

Expand Down

0 comments on commit ce1f5d3

Please sign in to comment.