Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: tcp updates + initial zmq #176

Merged
merged 6 commits into from
Feb 13, 2025
Merged

fix: tcp updates + initial zmq #176

merged 6 commits into from
Feb 13, 2025

Conversation

ryanolson
Copy link
Contributor

@ryanolson ryanolson commented Feb 13, 2025

What does the PR do?

  • Closes TCP TIME-WAIT #173
    • tcp server is responsible for closing the socket
    • tcp client will wait 10s awaiting the server to close the socket after the sentinel has been sent
  • adds inital zmq server and client via zmq router and zmq dealer respectively
  • adds logging module

Note - I am using panic! for an unrecoverable condition. These are conditions that should never happen and should be caught in CI and unit tests.

We should further address these via #171.

soak test can now easily handle 300k tcp open/closes in 1m

elapsed: 59.458815336s; count: 330000
test integration::main has been running for over 60 seconds
elapsed: 61.233558603s; count: 340000

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

Where should the reviewer start?

Test plan:

  • CI Pipeline ID:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Copy link
Contributor

@ptarasiewiczNV ptarasiewiczNV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm (from perspective of my very limited rust knowledge)

//! Logging can take two forms: `READABLE` or `JSONL`. The default is `READABLE`. `JSONL`
//! can be enabled by setting the `TRD_LOGGING_JSONL` environment variable to `1`.
//!
//! Filters can be configured using the `TRD_LOG` environment variable or by setting the `filters`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//! Filters can be configured using the `TRD_LOG` environment variable or by setting the `filters`
//! Filters can be configured using the `TRD_LOG` environment variable or by setting the `log_filters`

use tracing_subscriber::{filter::Directive, fmt};

/// ENV used to set the log level
const FILTER_ENV: &str = "TRD_LOG";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: change to more explicit TRD_LOGGING_LEVEL? Also following convention of TRT_LOGGING_ prefix for log config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TRD_LOG=info is all you need.

but you can also,

TRD_LOG=triton_distributed=trace,error

which lets all trd to trace and third party bits to error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normally it's RUST_LOG, but we can change the variable name.

@ryanolson ryanolson merged commit a36f530 into main Feb 13, 2025
6 checks passed
@ryanolson ryanolson deleted the ryan/zmq-init branch February 13, 2025 12:41
@@ -134,11 +133,11 @@ impl TcpClient {
.map_err(|e| error!("failed to send handshake: {:?}", e))?;

// set up the channel to send bytes to the transport layer
let (bytes_tx, bytes_rx) = tokio::sync::mpsc::channel(16);
let (bytes_tx, bytes_rx) = tokio::sync::mpsc::channel(64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this value tuned? When is 64 better than 16? When is 128 > 64? What's the con to increasing it? Should it be configured differently for different scenarios?

@@ -461,133 +444,171 @@ async fn tcp_listener(
}

// we need to know the buffer size from the registration options; add this to the RequestRecvConnection object
let (tx, rx) = mpsc::channel(16);
let (response_tx, response_rx) = mpsc::channel(64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here

let state = server.state.clone();

let id = "test-request".to_string();
let (tx, mut rx) = tokio::sync::mpsc::channel(512);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TCP TIME-WAIT
3 participants