-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pimd: Fix for data packet loss when FHR is LHR and RP #14227
base: master
Are you sure you want to change the base?
Conversation
CI:rerun rerunning with CI webhook after disabling the CI Checks App |
Continuous Integration Result: FAILEDTest incomplete. See below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: IncompleteAddresssanitizer topotests part 4: Incomplete(check logs for details)Successful on other platforms/tests
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is how the flow should be:
(S,G) - OIL list should be having pimreg + Interface-connected-to-LHR (due to inheritance) initially
The kernel stats must be getting incremented now since the traffic is coming over (S,G) and hence upstream->sptbit should have been set and pimreg removed and that way the register-stop would have been sent too due to spt bit. And this flap of adding and removing of pimreg would not be occurring.
So we need to check why upstream->sptbit could not be set, is there some problem with kernel stats or we are unable to reach the point where upstream->sptbit is set in pimd.
|| ((SwitchToSptDesiredOnRp(pim, &sg)) | ||
&& pim_upstream_inherited_olist(pim, upstream) == 0)) { | ||
if ((upstream->sptbit == PIM_UPSTREAM_SPTBIT_TRUE) || | ||
(PIM_UPSTREAM_FLAG_TEST_FHR(upstream->flags) && i_am_rp) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed over slack, please check why upstream->sptbit is not getting set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I got occupied with different work internally. I will check and update here later.
ci:rerun |
This PR is stale because it has been open 180 days with no activity. Comment or remove the |
ci:rerun |
Can you rebase instead ? |
Failure is not related to the change, re-running CI |
Topology: A single router is acting as the First Hop Router (FHR), Last Hop Router (LHR), and RP. RC and Issue: When an upstream S,G is in join state, it sends a register message to the RP. If the RP has the receiver, it sends a register stop message and switches to the shortest path. When the register stop message is processed, it removes pimreg, moves to prune, and starts the reg stop timer. When the reg stop timer expires, PIM changes S,G state to Join Pending and sends out a NULL register message to RP. RP receives it and fails to send Reg stop because SPT is not set at that point. The problem is when the register stop timer pops and state is in Join Pending. According to https://www.rfc-editor.org/rfc/rfc4601#section-4.4.1, we need to put back the pimreg reg tunnel into the S,G mroute. This causes data to be sent to the control plane and subsequently interrupts the line rate. Fix: If the router is FHR and RP to the group, ignore SPT status and send out a register stop message back to the DR (in this context, the same router). Ticket: #3506780 Signed-off-by: Donald Sharp <[email protected]> Signed-off-by: Rajesh Varatharaj <[email protected]>
ab16767
to
7c7af22
Compare
ci:rerun |
Topology:
A single router is acting as the First Hop Router (FHR), Last Hop Router (LHR), and RP.
RC and Issue:
When an upstream S,G is in join state, it sends a register message to the RP. If the RP has the receiver, it sends a register stop message and switches to the shortest path. When the register stop message is processed, it removes pimreg, moves to prune, and starts the reg stop timer.
When the reg stop timer expires, PIM changes S,G state to Join Pending and sends out a NULL register message to RP. RP receives it and fails to send Reg stop because SPT is not set at that point.
The problem is when the register stop timer pops and state is in Join Pending. According to https://www.rfc-editor.org/rfc/rfc4601#section-4.4.1, we need to put back the pimreg reg tunnel into the S,G mroute. This causes data to be sent to the control plane and subsequently interrupts the line rate.
Fix:
If the router is FHR and RP to the group,
ignore SPT status and send out a register stop message back to the DR (in this context, the same router).
Ticket: #
Signed-off-by: Donald Sharp [email protected]
Signed-off-by: Rajesh Varatharaj [email protected]