Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cmd line parser #219

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Add cmd line parser #219

wants to merge 4 commits into from

Conversation

Fredi-raspall
Copy link
Contributor

@Fredi-raspall Fredi-raspall commented Feb 7, 2025

goal: allow dataplane process to get other params, non-EAL related.

The crate used, clap, allows deriving parser functions automatically and defining default values.
However, that does not seem to work for certain types. We should also decide which args we can default and which ones are mandatory. We could default ALL (and then it would behave as it was, with stuff hardcoded).

We should also decide if we want to call the main proces "dataplane" or gateway?

A next-gen dataplane for next-gen fabric gateway

Usage: dataplane [OPTIONS]

Options:
      --main-lcore <core-id used as main>              [default: 2]
      --lcores <map lcore set to cpu set>              
      --in-memory                                      
      --allow <PCI devices to probe>                   
      --huge-worker-stack <huge pages>                 [default: 8192]
      --socket-mem <socket memory>                     
      --no-telemetry                                   
      --iova-mode <iova mode(va|pa)>                   
      --log-level <loglevel for a specific component>  
  -h, --help                                           Print help
  -V, --version                                        Print version

@Fredi-raspall Fredi-raspall marked this pull request as ready for review February 7, 2025 15:29
@Fredi-raspall Fredi-raspall requested a review from a team as a code owner February 7, 2025 15:29
@Fredi-raspall Fredi-raspall requested review from mvachhar and removed request for a team February 7, 2025 15:29
@Fredi-raspall Fredi-raspall force-pushed the fredi/args branch 2 times, most recently from e057db9 to 70fcfad Compare February 7, 2025 16:13
Copy link
Collaborator

@daniel-noland daniel-noland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I really like this PR.

I had a number of comments, but only two actual concerns.

Address comments or ping me on slack and we can discuss.

NIce work :)

huge_worker_stack: u32,
#[arg(long, value_name = "socket memory")]
socket_mem: Option<String>,
#[arg(long, value_name = "enable/disable telemetry", default_value_t = true)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should remove this argument as I forcibly disabled DPDK telemetry in a prior commit to dpdk-sys

(I understand that you are just mirroring the prior, also outdated, parameters here)

no_telemetry: bool,
#[arg(long, value_name = "iova mode(va|pa)")]
iova_mode: Option<String>,
#[arg(long, value_name = "loglevel for a specific component")]
Copy link
Collaborator

@daniel-noland daniel-noland Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what this argument is about. Are we plumbing this into the tracing crate in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into DPDK documentation and invoked the dataplane passing --log-level=help and this is what I got.

Log type is a pattern matching items of this list (plugins may be missing):
	bus.auxiliary
	bus.pci
	lib.eal
	lib.ethdev
	lib.hash
	lib.hash.crc
	lib.hash.fbk
	lib.hash.gfni
	lib.hash.thash
	lib.hash.thash_poly
	lib.mbuf
	lib.mempool
	lib.net
	lib.rcu
	lib.ring
	lib.telemetry
	pmd.common.mlx5
	pmd.net.mlx5
	user1
	user2
	user3
	user4
	user5
	user6
	user7
	user8

Syntax using globbing pattern:     --log-level pattern:level
Syntax using regular expression:   --log-level regexp,level
Syntax for the global level:       --log-level level
Logs are emitted if allowed by both global and specific levels.

Log level can be a number or the first letters of its name:
	1   emergency
	2   alert
	3   critical
	4   error
	5   warning
	6   notice
	7   info
	8   debug

I understood that with this command you can set the log levels specifically for certain modules.
So, I thought that this could be interesting. However, you're right, will the binding actually honor this ?
I'd say they should ...so, I just added the command as a direct pass-through to the EAL.

main_lcore: u8,
#[arg(long, value_name = "map lcore set to cpu set")]
lcores: Option<String>,
#[arg(long, value_name = "in-memory flag", default_value_t = true)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that we should "hard code" the --in-memory option.

Otherwise we risk leaking truly massive amounts of hugepage memory if DPDK gets a kill -9 or some other event which prevents it from cleaning up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that. Mind it defaults to true. But if you think that it should always be there, we can remove the option.

pub(crate) struct CmdArgs {
#[arg(long, value_name = "core-id used as main", default_value_t = 2)]
main_lcore: u8,
#[arg(long, value_name = "map lcore set to cpu set")]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with allowing this as a command line flag for the moment.

That said, in the long run, the set of lcores used for workers (and symmetrically [and inversely] the set of cores used for everything else) needs to be a function of linux kernel command line parameters like isolcpus, or tickless configurations like no_hz=full.

There is also NUMA to consider.

Given that these parameters are either hardware specific or boot time configuration, I'm not sure we can assume the command line is a great place to get this data.

That said, we could self-inspect for these parameters at launch (ideally in a parent process which invokes execs the dataplane). That process could hand us this information so maybe command line is the best place to get it.

// To be removed
if self.allow.len() == 0 {
out.push("--allow".to_string());
out.push("0000:01:00.0,dv_flow_en=1".to_string());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good default for the moment, but we need to be careful here. 0000:01:00.0 is the PCI address of the CX7 card in the dev machine. We can't count on it in general and we should let our deployment process depend on it.

That said, this is still a step in the right direction since the parameter is, at minimum, configurable now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just hardcoded it so that we'd get the same behavior without needing to pass it. The alternative would have been to add a bash (or just) script so that we don't have to write --allow .... each time we run it at this stage.
Note that the hardcoded value will only be used if no --allow is specified.


/* IOVA mode */
out.push(format!(
"--iova-mode={}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is another place where we need to be careful. The CX7 wants virtual memory, but other cards will require physical memory (virtual memory bypass). Ideally we will set this as a function of the driver of the network card we are going to poll.

But again, this is as good as we are going to do for the moment

@@ -7,6 +7,7 @@ license = "Apache-2.0"

[dependencies]

clap = { version = "4.5.27", features = ["derive"] }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move clap to the workspace level Cargo.toml (with default-features disabled and features = []).

Then change this line to read

clap = { workspace = true, features = ["derive"] }

This strategy has the advantage of centralizing version numbers and enforcing that all crates in this worspace are on a common version and using common compile options (features).

cargo deny will bounce us if we fail this check, but only on a per-crate basis.

I find that debugging can become enormously confusing if you end up in a situation where you ship or depend on multiple versions of the same package.
It is also far easier on our (future) operations and sec-ops teams when we can give them a single, unambiguous, and exact answer to what version of any given thing is in production at any given time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Thanks for the explanation @daniel-noland .

@daniel-noland
Copy link
Collaborator

We should also decide if we want to call the main proces "dataplane" or gateway?

This is a good question.

On thinking about it, I suggest that we convert the dataplane crate into a library and make another (much smaller) crate called gateway which is limited to things like parsing command line parameters and calling dataplane functions.

I am a big believer in thin applications and fat libraries. It makes testing easier, and who knows when we might want to recycle dataplane logic for some other purpose.

@Fredi-raspall
Copy link
Contributor Author

Overall I really like this PR.

I had a number of comments, but only two actual concerns.

Address comments or ping me on slack and we can discuss.

NIce work :)

Thanks ... Fine to discuss if you're around!

.. so that the GW dataplane can accept also non-EAL related params.

Signed-off-by: Fredi Raspall <[email protected]>
This commit is to be reverted, but allows starting the dataplane
without any arg, as it was before. Also, printed the config, which
should be replaced by logs.

Signed-off-by: Fredi Raspall <[email protected]>
Produce a binary called "gateway" instead of dataplane.

Signed-off-by: Fredi Raspall <[email protected]>
@Fredi-raspall
Copy link
Contributor Author

Fredi-raspall commented Feb 7, 2025

@daniel-noland I've pushed some changes and also renamed the crate to "gateway". I have NOT changed the name of the dataplane folder so as not to confuse folks that have outstanding PRs.
I've had to rename the folder too. Otherwise cargo fmt was complaining.
If we want to rename it, better sooner than later. I'm fine with other names btw!

Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for renaming dataplane to gateway?

[EDIT: Sorry I had missed part of the conversation. I like Daniel's suggestion of a dataplane library + gateway binary, for what it's worth.]

Comment on lines +8 to +10
[[bin]]
name = "gateway"
path = "src/main.rs"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will require some additional changes in the Dockerfile, as well as in the justfile for targets sterile-build and build-container where we reference binaries or Rust packages called dataplane.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants