From 33025d97b24c404f5000ed029f070c154d8d60a5 Mon Sep 17 00:00:00 2001 From: Donald Sharp Date: Tue, 31 Oct 2023 13:06:16 -0400 Subject: [PATCH] pimd: Ensure upstream points at the correct rpf In the scenario on an intermediate router where a *,G join has been received and a S,G stream is being sent through that router on the *,G stream, there exists a situation when the *,G in has been pruned but the stream is still being received on on incoming interface towards the RP for the *,G. In this situation PIM will see the S,G stream initially as a NOCACHE from the dataplane, PIM will then do a RPF for the S and notice that it is supposed to be coming in on adifferent interface. In this case PIM the original PIM code would create a blackhole mroute towards the RPF of the *,G( the interface the stream is being received on ). The original reason for this is that if there is a scenario where this particular S1,G stream is sending at basically line rate, and there also happens to be a different S2,G stream that is sending at a very low rate. With certain dataplanes there is no way to really rate limit the S1 -vs- S2 stream and the S1 stream completely overwhelms the S2 stream for sending up to the control plane for proper pim handling. The problem then becomes that FRR never properly responds to the situation where the *,G is rereceived and the S,G stream switches back over to the SPT for itself and FRR ends up with a dead mroute that stops everything from working properly. This code change, installs the blackhole mroute with the RPF towards the RP for the G and then resets the RPF to the correct RPF for the Stream but does not modify the mroute. When the *,G is rereceived and we attempt to transition to the S,G stream this now works. As a note: Both David L and myself do not necessarily believe we fully understand the problem yet. What this does do is fix all the inconsistent CI issues we are seeing in the topotests at this time. Internally I am seeing other test failures in PIM that I don't fully understand and we suspect that there are other problems in the state machine. We plan to revisit this problem as we are able to debug the issue better. In the meantime both David and Myself agree that this gets the CI working again and Streams end up in the right state. Signed-off-by: Donald Sharp --- pimd/pim_upstream.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/pimd/pim_upstream.c b/pimd/pim_upstream.c index 743a047b0acb..e36bd82ac6e1 100644 --- a/pimd/pim_upstream.c +++ b/pimd/pim_upstream.c @@ -662,10 +662,9 @@ void pim_upstream_update_use_rpt(struct pim_upstream *up, new_use_rpt = !!PIM_UPSTREAM_FLAG_TEST_USE_RPT(up->flags); if (old_use_rpt != new_use_rpt) { if (PIM_DEBUG_PIM_EVENTS) - zlog_debug("%s switched from %s to %s", - up->sg_str, - old_use_rpt?"RPT":"SPT", - new_use_rpt?"RPT":"SPT"); + zlog_debug("%s switched from %s to %s", up->sg_str, + old_use_rpt ? "RPT" : "SPT", + new_use_rpt ? "RPT" : "SPT"); if (update_mroute) pim_upstream_mroute_add(up->channel_oil, __func__); } @@ -904,9 +903,15 @@ static struct pim_upstream *pim_upstream_new(struct pim_instance *pim, false /*update_mroute*/); pim_upstream_mroute_iif_update(up->channel_oil, __func__); - if (PIM_UPSTREAM_FLAG_TEST_SRC_NOCACHE(up->flags)) + if (PIM_UPSTREAM_FLAG_TEST_SRC_NOCACHE(up->flags)) { + /* + * Set the right RPF so that future changes will + * be right + */ + rpf_result = pim_rpf_update(pim, up, NULL, __func__); pim_upstream_keep_alive_timer_start( up, pim->keep_alive_time); + } } else if (!pim_addr_is_any(up->upstream_addr)) { pim_upstream_update_use_rpt(up, false /*update_mroute*/);