-
Notifications
You must be signed in to change notification settings - Fork 0
[feature] ASIC-focused 1st draft of multicast PRE #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: multicast
Are you sure you want to change the base?
Conversation
4e2fc4b
to
1ce4908
Compare
Includes: * Multicast Group API management: add, modify, delete, reset, SSM handling * sidecar.p4 tofino_asic updates to integrate multicast packet replication * Table mangagement for ASIC switch tables for multicast * integration tests and test utility additions
1ce4908
to
df62508
Compare
2426e18
to
4a945a8
Compare
I'll add some notes to the PR tomorrow morning (to explain through pieces). |
/// | ||
/// If that counter isn't in the set returned by dpd, we return | ||
/// an error to the caller. | ||
pub async fn get_counter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved these functions in for use beyond the counters test (and having mcast tests all live in that module)
CounterDescription { | ||
id: CounterId::DropPort, | ||
client_name: "Drop_Port", | ||
client_name: "Ingress_Drop_Port", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: client name changes to decipher.
dpd/p4/constants.p4
Outdated
const bit<8> DROP_MULTICAST_INVALID_MAC = 0x13; | ||
const bit<8> DROP_MULTICAST_CPU_COPY = 0x14; | ||
const bit<8> DROP_MULTICAST_SOURCE_FILTERED = 0x15; | ||
const bit<32> DROP_REASON_MAX = 0x16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I did move this around to make MAX post the new drop additions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Zeeshan. I've provided some comments on the API and am about half way through the P4 code. Have to run, but wanted to get these initial comments in.
( 0, false, true, true, _, USER_SPACE_SERVICE_PORT, true, _, _, _, _, _, _ ) : forward_from_userspace; | ||
( 0, false, false, _, _, _, false, true, _, _, _, _, _ ) : forward_to_userspace; | ||
( 0, false, false, _, true, _, _, _, _, _, _, _, _ ) : forward_to_userspace; | ||
// Link-local multicast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're changing quite a bit around link-local multicast, we need to test this e2e with ddm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this Zeeshan, some notes so far. I haven't yet made it through dpd/src/mcast.rs
and the table management code yet.
Includes: * make sure to avoid switch table apply on drop-specific code in sidecar
Got this back down to 14 stages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've had a look over all the changes now, thanks for acting on the last round of feedback.
expected_pkts.push(TestPacket { | ||
packet: Arc::new(to_recv.clone()), | ||
port: egress1, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this ties in with my confusion over why we're always following NAT ingress with NAT egress in the multicast case, given that the output packets are only different in their VLAN. I thought a bare frame would come out Geneve encapsulated onto the underlay network (integration_tests::nat::gen_geneve_packet
)? The behaviour I had maybe expected would be:
- Multicast packet arrives on a front panel port, possibly on VLAN
x
.- Strip VLAN, Geneve encapsulate packet using fields in
group_entry.nat_target
. - Adjust TTL.
- Replicate Geneve encapsulated frames out on all rear ports on the mcast group.
- Strip VLAN, Geneve encapsulate packet using fields in
- Encapsulated multicast packet arrives on a rear panel port.
- Adjust TTL.
- Replicate frame, including Geneve, onto all other rear panel ports in the group.
- Replicate frame, excluding Geneve, onto all front panel ports in the group. Insert VLAN
x
if required by link.
Am I coming at this the wrong way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I get your question (and we're chatting tmrw); however, I'm trying to map/orient how we should do this (vs unicast). Today, nat_egress
is triggered when we match a switch_address (a v4/v6 address assigned to ports on the switch when configuring port settings on switch initialization). I definitely don't know enough (classic!), but for multicast group creation, there's nothing specifically tying multicast groups to ingress ports (so that we'd know if it's a front vs rear panel port or not), and it will dynamically change.
So, we'd need to explore an alternative approach to what we did for unicast before, though we could reuse the switch tables, possibly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, after discussion, we have a way forward in separating out ingress(encap)/egress(decap) ports for replication (and some minor changes to the .p4 code to avoid checks when necessary). I'll push the changes up later today.
const bit<8> SC_ARP_NEEDED = 0x03; | ||
const bit<8> SC_NEIGHBOR_NEEDED = 0x04; | ||
const bit<8> SC_INVALID = 0xff; | ||
// Includes the checksum for the original data, the geneve header, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is all of this in this file? None of these are constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, yeah, I'll move defines to the top of the sidecar.
let group_id = query_id.into_inner().group_id; | ||
|
||
// If a group ID is provided, get the group by ID | ||
if let Some(group_id) = group_id { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elsewhere, we have different entry points for GETs returning all or one. e.g., /ndp/
and /ndp/{ip}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's why I made this a query param if you want to use the ID (vs the {ip} variant). But, maybe I should just remove the get by group_id entirely to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm saying the opposite of that. I'm asking you to add a second entry point rather than use a query param, so this is consistent with the other APIs. If you don't think there is a use case for getting a single group, then you can remove the query param and not add a second endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. With the updates coming for separating ingress/egress port members, I think it's less necessary.
Sadly, for a second entry point, I ran into issues with/ dropshot b/c it would error on having both /group/{ip} and /group/{id} (even with different types), so then the route names would have to be different. So, yeah, I'll remove it in the next push.
Ord, | ||
PartialOrd, | ||
Clone, | ||
Deserialize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add these new derive
s? It doesn't look like VlanId
is part of the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I waas using it previously, but dropped it. Good catch, will remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be table/mcast/mod.rs
to keep with the style elsewhere in the repo.
I would also rename the sub-modules so they are unique, and we don't have two route.rs
modules inside dpd
. Maybe prefix them all with mcast_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll move to the "older style".
@@ -30,7 +30,7 @@ struct IndexKey { | |||
|
|||
// Route entries stored in the index->route_data table | |||
#[derive(ActionParse, Debug)] | |||
enum RouteAction { | |||
pub(crate) enum RouteAction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it leads to a little code duplication, but I'd like to keep the table keys private to their modules.
@@ -85,6 +85,10 @@ pub enum DpdError { | |||
Oximeter(String), | |||
#[error("No switch identifiers available")] | |||
NoSwitchIdentifiers, | |||
#[error("Multicast group error: {}", .0)] | |||
McastGroup(String), | |||
#[error("Resource exhausted: {}", .0)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only place this is used is when we're out of group IDs. You could just make that a specific error, rather than the vague "resource exhausted".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ResourceExhaused is for the next line, but I'll rename this to something more apt.
@@ -104,6 +104,19 @@ impl Ipv4Hdr { | |||
hdr.ipv4_ttl = ttl as u8; | |||
Ipv4Hdr::update_checksum(pkt); | |||
} | |||
|
|||
pub fn set_identification(pkt: &mut Packet, id: u16) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither of these seem to be used anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I was initially setting these into the packet to copy over some mcast info. But, I'll remove this.
pub fn adjust_hlim(pkt: &mut Packet, delta: i8) { | ||
let hdr = pkt.hdrs.ipv6_hdr.as_mut().unwrap(); | ||
hdr.ipv6_hop_lim = (hdr.ipv6_hop_lim as i8 + delta) as u8; | ||
pub fn adjust_hlim(pkt: &mut Packet, delta: i16) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an 8 bit field - why allow a 16 bit delta?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was so I could update it to set it to 0 for a test:
ipv6::Ipv6Hdr::adjust_hlim(&mut to_send, -255);
Note: This mainly works for the
tofinio_asic
feature. Hence, I've created a multicast branch to plug updates/PRs into.Includes:
* Multicast Group API management: add, modify, delete, reset, SSM handling
* sidecar.p4 tofino_asic updates to integrate multicast packet replication
* Table management for ASIC switch tables for multicast
* integration tests and test utility additions