Small changes: code reuse, simplify, doc comments, client timeouts #25

declark1 · 2024-05-09T16:32:42Z

This PR is a batch of several small changes:

Adds input_masks(), input_detectors(), and output_detectors() helper methods to GuardrailsConfig to reuse code shared by unary and streaming task handlers.
Moves detection task spawning code from chunk_and_detect() to detect(), consistent with chunk().
Updates detect() and chunk() code to be easier to follow for new Rustacians
- Tokio task handles are collected to tasks and separately, try_join_all() is used to await them, collecting results to results
Added doc and general comments to orchestrator task handlers
Dropped unneeded tokio::spawn from handle_chunk_task() and handle_detection_task()
Added default connect and request timeouts for http & grpc clients
- This is to override actual client defaults which are too high, configurable service-level timeouts will be added in #17
Added request_id to new requests
Updated "handling task" (info level) log event to include request_id and exclude inputs text (also addresses #18)
Fixed broken test_deserialize_config() unit test (tls is currently a required field, although it should be Option)

Signed-off-by: declark1 <[email protected]>

gkumbhat · 2024-05-09T17:37:37Z

src/server.rs

@@ -92,7 +94,8 @@ async fn stream_classification_with_gen(
    State(state): State<Arc<ServerState>>,
    Json(request): Json<models::GuardrailsHttpRequest>,
 ) -> Result<impl IntoResponse, (StatusCode, Json<String>)> {
-    let task = StreamingClassificationWithGenTask::new(request);
+    let request_id = Uuid::new_v4();


ideally we would be getting a request id / transaction id from our upstream and it would be good to stick those in logs for tracking purpose and use this as a "default", if not provided. Would you expect any issues with replace this request_id later on? Since you have added it already in downstream functions, it would essentially look like replace request_id with something from request (although it may be in header, so we may need to do some processing)

We haven't instrumented tracing here yet, but I think the trace-id should probably be used for tracking across services? We should probably still have a unique service-level request identifier though, which is what this is for.

If there are additional transaction id(s) provided from upstream services via headers, we can still extract and record them. I haven't seen anything documented on this to know what to expect.

evaline-ju · 2024-05-09T17:37:49Z

src/orchestrator.rs

+            model_id = %task.model_id,
+            config = ?task.guardrails_config,
+            "handling task"
+        );


q - can we still tell from this info log vs. the streaming one that this is the "unary" task vs. the "streaming" task?

Currently, no. I was thinking to add detail once we implement streaming and/or additional methods.

I could make this simply "handling unary task" and "handling streaming task", but I assumed there could be additional task types added. I was trying to avoid "handling classification with gen task" and "handling streaming classification with gen task" as these are verbose.

Ideally, these API method names (and associated task handlers) can be renamed/simplified (generate and generate_stream would be nice)

I'll change it to "handling unary task" and "handling streaming task" for now.

Signed-off-by: declark1 <[email protected]>

evaline-ju

LGTM!

declark1 added 6 commits May 9, 2024 08:45

Add helper methods to GuardrailsConfig, update Orchestrator to use them

9b4f971

Signed-off-by: declark1 <[email protected]>

Add detect function to spawn detection tasks

aa7bef6

Signed-off-by: declark1 <[email protected]>

Update chunk function

fdc6d92

Signed-off-by: declark1 <[email protected]>

Add doc and general comments

16cb91c

Signed-off-by: declark1 <[email protected]>

Drop tokio::spawn from handle_chunk_task and handle_detection_task

d50fb44

Signed-off-by: declark1 <[email protected]>

Add default connect and request timeouts for clients

8aa6448

Signed-off-by: declark1 <[email protected]>

declark1 requested a review from gkumbhat May 9, 2024 16:32

declark1 added 2 commits May 9, 2024 09:41

Add request_id, update handling task log event

2ac1cca

Signed-off-by: declark1 <[email protected]>

Fix broken test_deserialize_config unit test

250e362

Signed-off-by: declark1 <[email protected]>

gkumbhat reviewed May 9, 2024

View reviewed changes

evaline-ju reviewed May 9, 2024

View reviewed changes

Update handling task log events

ba222b4

Signed-off-by: declark1 <[email protected]>

evaline-ju approved these changes May 9, 2024

View reviewed changes

declark1 merged commit b6f5030 into main May 9, 2024
1 check passed

declark1 deleted the dc-tweaks branch May 9, 2024 22:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small changes: code reuse, simplify, doc comments, client timeouts #25

Small changes: code reuse, simplify, doc comments, client timeouts #25

declark1 commented May 9, 2024 •

edited

Loading

gkumbhat May 9, 2024 •

edited

Loading

declark1 May 9, 2024 •

edited

Loading

evaline-ju May 9, 2024

declark1 May 9, 2024 •

edited

Loading

declark1 May 9, 2024

evaline-ju left a comment

Small changes: code reuse, simplify, doc comments, client timeouts #25

Small changes: code reuse, simplify, doc comments, client timeouts #25

Conversation

declark1 commented May 9, 2024 • edited Loading

gkumbhat May 9, 2024 • edited Loading

Choose a reason for hiding this comment

declark1 May 9, 2024 • edited Loading

Choose a reason for hiding this comment

evaline-ju May 9, 2024

Choose a reason for hiding this comment

declark1 May 9, 2024 • edited Loading

Choose a reason for hiding this comment

declark1 May 9, 2024

Choose a reason for hiding this comment

evaline-ju left a comment

Choose a reason for hiding this comment

declark1 commented May 9, 2024 •

edited

Loading

gkumbhat May 9, 2024 •

edited

Loading

declark1 May 9, 2024 •

edited

Loading

declark1 May 9, 2024 •

edited

Loading