Skip to content

[feature] ASIC-focused 1st draft of multicast PRE #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: multicast
Choose a base branch
from

Conversation

zeeshanlakhani
Copy link
Contributor

@zeeshanlakhani zeeshanlakhani commented Apr 1, 2025

Note: This mainly works for the tofinio_asic feature. Hence, I've created a multicast branch to plug updates/PRs into.

Includes:
* Multicast Group API management: add, modify, delete, reset, SSM handling
* sidecar.p4 tofino_asic updates to integrate multicast packet replication
* Table management for ASIC switch tables for multicast
* integration tests and test utility additions

@zeeshanlakhani zeeshanlakhani force-pushed the zl/p4-mcast branch 3 times, most recently from 4e2fc4b to 1ce4908 Compare April 2, 2025 02:40
Includes:
    * Multicast Group API management: add, modify, delete, reset, SSM handling
    * sidecar.p4 tofino_asic updates to integrate multicast packet replication
    * Table mangagement for ASIC switch tables for multicast
    * integration tests and test utility additions
@zeeshanlakhani zeeshanlakhani changed the title wip [feature] ASIC-focused 1st draft of multicast PRE Apr 15, 2025
@zeeshanlakhani zeeshanlakhani marked this pull request as ready for review April 15, 2025 19:37
@zeeshanlakhani
Copy link
Contributor Author

I'll add some notes to the PR tomorrow morning (to explain through pieces).

///
/// If that counter isn't in the set returned by dpd, we return
/// an error to the caller.
pub async fn get_counter(
Copy link
Contributor Author

@zeeshanlakhani zeeshanlakhani Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved these functions in for use beyond the counters test (and having mcast tests all live in that module)

CounterDescription {
id: CounterId::DropPort,
client_name: "Drop_Port",
client_name: "Ingress_Drop_Port",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: client name changes to decipher.

const bit<8> DROP_MULTICAST_INVALID_MAC = 0x13;
const bit<8> DROP_MULTICAST_CPU_COPY = 0x14;
const bit<8> DROP_MULTICAST_SOURCE_FILTERED = 0x15;
const bit<32> DROP_REASON_MAX = 0x16;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I did move this around to make MAX post the new drop additions.

Copy link
Contributor

@rcgoodfellow rcgoodfellow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Zeeshan. I've provided some comments on the API and am about half way through the P4 code. Have to run, but wanted to get these initial comments in.

( 0, false, true, true, _, USER_SPACE_SERVICE_PORT, true, _, _, _, _, _, _ ) : forward_from_userspace;
( 0, false, false, _, _, _, false, true, _, _, _, _, _ ) : forward_to_userspace;
( 0, false, false, _, true, _, _, _, _, _, _, _, _ ) : forward_to_userspace;
// Link-local multicast
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're changing quite a bit around link-local multicast, we need to test this e2e with ddm.

Copy link

@FelixMcFelix FelixMcFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this Zeeshan, some notes so far. I haven't yet made it through dpd/src/mcast.rs and the table management code yet.

@zeeshanlakhani
Copy link
Contributor Author

Got this back down to 14 stages.

Copy link

@FelixMcFelix FelixMcFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've had a look over all the changes now, thanks for acting on the last round of feedback.

Comment on lines +2037 to +2040
expected_pkts.push(TestPacket {
packet: Arc::new(to_recv.clone()),
port: egress1,
});
Copy link

@FelixMcFelix FelixMcFelix Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this ties in with my confusion over why we're always following NAT ingress with NAT egress in the multicast case, given that the output packets are only different in their VLAN. I thought a bare frame would come out Geneve encapsulated onto the underlay network (integration_tests::nat::gen_geneve_packet)? The behaviour I had maybe expected would be:

  • Multicast packet arrives on a front panel port, possibly on VLAN x.
    • Strip VLAN, Geneve encapsulate packet using fields in group_entry.nat_target.
    • Adjust TTL.
    • Replicate Geneve encapsulated frames out on all rear ports on the mcast group.
  • Encapsulated multicast packet arrives on a rear panel port.
    • Adjust TTL.
    • Replicate frame, including Geneve, onto all other rear panel ports in the group.
    • Replicate frame, excluding Geneve, onto all front panel ports in the group. Insert VLAN x if required by link.

Am I coming at this the wrong way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I get your question (and we're chatting tmrw); however, I'm trying to map/orient how we should do this (vs unicast). Today, nat_egress is triggered when we match a switch_address (a v4/v6 address assigned to ports on the switch when configuring port settings on switch initialization). I definitely don't know enough (classic!), but for multicast group creation, there's nothing specifically tying multicast groups to ingress ports (so that we'd know if it's a front vs rear panel port or not), and it will dynamically change.

So, we'd need to explore an alternative approach to what we did for unicast before, though we could reuse the switch tables, possibly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, after discussion, we have a way forward in separating out ingress(encap)/egress(decap) ports for replication (and some minor changes to the .p4 code to avoid checks when necessary). I'll push the changes up later today.

const bit<8> SC_ARP_NEEDED = 0x03;
const bit<8> SC_NEIGHBOR_NEEDED = 0x04;
const bit<8> SC_INVALID = 0xff;
// Includes the checksum for the original data, the geneve header, the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is all of this in this file? None of these are constants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, yeah, I'll move defines to the top of the sidecar.

let group_id = query_id.into_inner().group_id;

// If a group ID is provided, get the group by ID
if let Some(group_id) = group_id {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere, we have different entry points for GETs returning all or one. e.g., /ndp/ and /ndp/{ip}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's why I made this a query param if you want to use the ID (vs the {ip} variant). But, maybe I should just remove the get by group_id entirely to avoid confusion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm saying the opposite of that. I'm asking you to add a second entry point rather than use a query param, so this is consistent with the other APIs. If you don't think there is a use case for getting a single group, then you can remove the query param and not add a second endpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. With the updates coming for separating ingress/egress port members, I think it's less necessary.

Sadly, for a second entry point, I ran into issues with/ dropshot b/c it would error on having both /group/{ip} and /group/{id} (even with different types), so then the route names would have to be different. So, yeah, I'll remove it in the next push.

Ord,
PartialOrd,
Clone,
Deserialize,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add these new derives? It doesn't look like VlanId is part of the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I waas using it previously, but dropped it. Good catch, will remove.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be table/mcast/mod.rs to keep with the style elsewhere in the repo.
I would also rename the sub-modules so they are unique, and we don't have two route.rs modules inside dpd. Maybe prefix them all with mcast_?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll move to the "older style".

@@ -30,7 +30,7 @@ struct IndexKey {

// Route entries stored in the index->route_data table
#[derive(ActionParse, Debug)]
enum RouteAction {
pub(crate) enum RouteAction {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it leads to a little code duplication, but I'd like to keep the table keys private to their modules.

@@ -85,6 +85,10 @@ pub enum DpdError {
Oximeter(String),
#[error("No switch identifiers available")]
NoSwitchIdentifiers,
#[error("Multicast group error: {}", .0)]
McastGroup(String),
#[error("Resource exhausted: {}", .0)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place this is used is when we're out of group IDs. You could just make that a specific error, rather than the vague "resource exhausted".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResourceExhaused is for the next line, but I'll rename this to something more apt.

@@ -104,6 +104,19 @@ impl Ipv4Hdr {
hdr.ipv4_ttl = ttl as u8;
Ipv4Hdr::update_checksum(pkt);
}

pub fn set_identification(pkt: &mut Packet, id: u16) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither of these seem to be used anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I was initially setting these into the packet to copy over some mcast info. But, I'll remove this.

pub fn adjust_hlim(pkt: &mut Packet, delta: i8) {
let hdr = pkt.hdrs.ipv6_hdr.as_mut().unwrap();
hdr.ipv6_hop_lim = (hdr.ipv6_hop_lim as i8 + delta) as u8;
pub fn adjust_hlim(pkt: &mut Packet, delta: i16) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an 8 bit field - why allow a 16 bit delta?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was so I could update it to set it to 0 for a test:

ipv6::Ipv6Hdr::adjust_hlim(&mut to_send, -255);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants