-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is OSPF route delete logging incomplete or am I missing it in another debug log? #16847
Comments
The code is telling you the route is being replaced w/ 1 nexthop. |
You're saying the delete is implicit, but if the link is later restored, both are logged:
In this case the nexthop is cumulative, not a replacement. If there's an implicit route delete in one case but not in the other, how would an observer know the difference? ETA: I omitted the route additions for the |
ospf debugs each nexthop one-by-one - that's just ... how that particular debug is written |
There is no delete on a route replacement. I believe that is your confusion. All the code is saying replace the route with this new one and here is the list of nexthops I am sending down. |
I'm tracking, but the confusion is in the logging, not in me. Say, frex, that I'm parsing the events and reconstructing the FIB. On any given If I have to look into the future to determine if a given |
Mark is correct that is just how these debugs are written, nor am I going to sanction a change in debug logs to stuff a bunch of nexthops on one line. Debug logs are meant to be readable by a human. I disagree with the notion that all debugs should reflect a single event as well, debug logs are meant to help the developer debug the code when something has gone wrong, that is it. Informational/Warning and Error I would agree with you, though. We also MAKE no guarantee about debug logs. They can and do change from release to release as we improve things. Hell one of the major problems with debugs that I am still fixing are debugs that don't include the vrf in them( thanks for pointing this out to me for these debugs, I'll address that). In any event reading debug logs is not a way that I would ever recommend for doing what I think you are trying to do, although in all honesty I am not really sure what you are trying to do so I cannot really point you in a direction. ospfd has a way to peak into it's database. Zebra has some basic scripting with lua that has a way to hook into rib processing. What is the actual end goal of your problem? |
I'm running emulated networks (at scale -- 100's to 1000's of nodes), capturing atomic changes to the RIB at all nodes, and analyzing the network's resilience given a range of disruption events. I've been adapting the ideas from this paper and will eventually apply them independent of the routing protocol. Everything is there in the ospfd zebra logs except this one RIB change with ECMP paths. It's frustrating. :)
That's news to me, I'll look into it. Thanks for the pointer.
I get that, but even the human gets confused in this instance; thus this discussion. My specific use case might be very niche, but the general use case -- understanding the local change to the RIB after a distant event -- is not. |
it may be possible to revise the debug that isn't clear to you - to indicate in the first line more about the event that's happening - why don't you look into that?
|
The zebra Lua dataplane hook looks promising, but Lua script is loaded into an empty namespace, so necessary global functions like And no, |
So, our story so far: Working off current HEAD w/ the unrelease local ops = {[0]="NONE", "ROUTE_INSTALL", "ROUTE_UPDATE", "ROUTE_DELETE", "ROUTE_NOTIFY",
"NH_INSTALL", "NH_UPDATE", "NH_DELETE", "LSP_INSTALL", "LSP_UPDATE",
"LSP_DELETE", "LSP_NOTIFY", "PW_INSTALL", "PW_UNINSTALL", "SYS_ROUTE_ADD",
"SYS_ROUTE_DELETE", "ADDR_INSTALL", "ADD_UNINSTALL", "MAC_INSTALL",
"MAC_DELETE", "NEIGH_INSTALL", "NEIGH_UDPATE", "NEIGH_DELETE",
"VTEP_ADD", "VTEP_DELETE", "RULE_ADD", "RULE_DELETE", "RULE_UPDATE",
"NEIGH_DISCOVER", "BR_PORT_UPDATE", "IPTABLE_ADD", "IPTABLE_DELETE",
"IPSET_ADD", "IPSET_DELETE", "IPSET_ENTRY_ADD", "IPSET_ENTRY_DELETE",
"NEIGH_IP_INSTALL", "NEIGH_IP_DELETE", "NEIGH_TABLE_UPDATE", "GRE_SET",
"INTF_ADDR_ADD", "INTF_ADDR_DELETE", "INTF_NETCONFIG", "INTF_INSTALL",
"INTF_UPDATE", "INTF_DELETE", "TC_QDISC_INSTALL", "TC_QDISC_UNINSTALL",
"TC_QDISC_CLASS_ADD", "TC_QDISC_CLASS_DELETE", "TC_QDISC_CLASS_UPDATE",
"TC_QDISC_FILTER_ADD", "TC_QDISC_FILTER_DELETE", "TC_QDISC_FILTER_UPDATE",
"STARTUP_STAGE", "SRV6_ENCAP_SRCADDR_SET"}
function on_rib_process_dplane_results(ctx)
local rinfo = ctx.rinfo
local event = "op=" .. ops[ctx.zd_op] .. " dest=" .. rinfo.zd_dest.network .. " ifname=" .. ct
local nhe = rinfo.nhe
event = event .. " nhe.id=" .. nhe.id
log.info(event)
log.trace(rinfo.zd_ng) -- only works on FRR HEAD; unreleased code
return {}
end Logs:
So it's a single dataplane event (DPLANE_OP_ROUTE_UPDATE) that replaces the current route entries for the prefix with the set of nexthops in the group. The logging code turns this into multiple log lines, one per member of the nexthop group as "Route add". Yes, humans can grok what's happening but it's much harder for code -- and not just my code, but log analysis code e.g. in Kibana or Loki. I could alter this logging with the zebra dataplane Lua hook -- except that Lua is crippled with an incomplete namespace. The only way to iterate over a Lua array or table is In addition, it would be nice to be able to pull in data from the OS, but that will require The other option is to change the logging behavior to log the entire nexthop group in one entry. If I can figure it out I'll submit a PR and we can discuss the merits there. |
Closing this; #16907 fixes the Lua namespace so walking the |
If anyone else ever lands here looking for similar, here's what I finally ended up with. I don't like using -- Cribbed from zebra_dplane.h (dropped the DPLANE_OP prefix)
local ops = {[0]="NONE", "ROUTE_INSTALL", "ROUTE_UPDATE", "ROUTE_DELETE", "ROUTE_NOTIFY",
"NH_INSTALL", "NH_UPDATE", "NH_DELETE", "LSP_INSTALL", "LSP_UPDATE",
"LSP_DELETE", "LSP_NOTIFY", "PW_INSTALL", "PW_UNINSTALL", "SYS_ROUTE_ADD",
"SYS_ROUTE_DELETE", "ADDR_INSTALL", "ADD_UNINSTALL", "MAC_INSTALL",
"MAC_DELETE", "NEIGH_INSTALL", "NEIGH_UDPATE", "NEIGH_DELETE",
"VTEP_ADD", "VTEP_DELETE", "RULE_ADD", "RULE_DELETE", "RULE_UPDATE",
"NEIGH_DISCOVER", "BR_PORT_UPDATE", "IPTABLE_ADD", "IPTABLE_DELETE",
"IPSET_ADD", "IPSET_DELETE", "IPSET_ENTRY_ADD", "IPSET_ENTRY_DELETE",
"NEIGH_IP_INSTALL", "NEIGH_IP_DELETE", "NEIGH_TABLE_UPDATE", "GRE_SET",
"INTF_ADDR_ADD", "INTF_ADDR_DELETE", "INTF_NETCONFIG", "INTF_INSTALL",
"INTF_UPDATE", "INTF_DELETE", "TC_QDISC_INSTALL", "TC_QDISC_UNINSTALL",
"TC_QDISC_CLASS_ADD", "TC_QDISC_CLASS_DELETE", "TC_QDISC_CLASS_UPDATE",
"TC_QDISC_FILTER_ADD", "TC_QDISC_FILTER_DELETE", "TC_QDISC_FILTER_UPDATE",
"STARTUP_STAGE", "SRV6_ENCAP_SRCADDR_SET"}
function on_rib_process_dplane_results(ctx)
local rinfo = ctx.rinfo
local event = "op=" .. ops[ctx.zd_op] .. " dest=" .. rinfo.zd_dest.network
if rinfo.zd_ng then
for k, v in pairs(rinfo.zd_ng) do
event = event .. " if." .. k .. "="
local handle = io.popen("ip --json link | jq -r '.[] | select(.ifindex == " .. v.ifindex .. ") | .ifname'")
event = event .. handle:read()
handle:close()
end
end
log.info(event)
return {} Logging on a distant link down LSA:
Logs on the link back up LSA:
Extended logging might be another option but in my specific case logs are going to stdout for aggregation by other tooling. |
Discussed in #16760
Originally posted by Cerebus September 6, 2024
Version 9.1 currently but I can upgrade if necessary.
I have
debug ospf zebra
enabled b/c I want to see changes to the routing table.When a distant link that's part of an ECMP set dies and OSPF processes the LSA, it does not log the deletion of the ECMP route.
E.g., given the starting point:
The
0.6/31
link is traversed by0.12/31
destinations over the eth1 link. If that distant link dies, OSPF logs:And the resulting table:
The resulting table is correct, but the log omits the deletion of
0.12/31 nexthop via eth1
. Is this info in another debug log, or is this a bug?The text was updated successfully, but these errors were encountered: