-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/mch rebased otel #25
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't gone through worker.py yet, but here are some initial comments
polytope_server/common/request.py
Outdated
@@ -57,6 +57,7 @@ class Request: | |||
"user_request", | |||
"content_length", | |||
"content_type", | |||
"otel_trace_ctx", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible we will want other otel fields in the future? If so, I recommend calling this otel_trace and storing the context as a field within to avoid polluting the request with many otel_trace_* slots or the expensive refactor in the future to change this slot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a valid point, although the goal is to propagate only the context, keeping the implementation minimally intrusive in the codebase, I agree that makes sense to have a more general naming for this field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Based on the docs I think just the context is reasonable.
# Adding trace_id and span_id to the request data for logging | ||
request.otel_trace_ctx = { | ||
'carrier': carrier # Store the injected carrier | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7-15 are also repeated in update_trace_context. Extract these
@contextmanager | ||
def create_new_span_producer(span_name, request_id=None, parent_context=None, kind=trace.SpanKind.SERVER): | ||
with create_new_span(span_name, request_id, parent_context, kind) as span: | ||
span.set_attribute("role", "producer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to do this as four separate methods rather than passing the role as an argument? The only one that is odd is create_new_span_internal
since there is only one valid mapping from kind to role.
In the case of passing the role, it could be passed as an enum value similarly to kind
so that we limit what the options are to these three methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you are right, during testing, I frequently had to switch between Consumer, Producer, and Server span kinds. To simplify this and avoid modifying each service individually, I created dedicated functions for each type. This way, it's clear that, for example, a Producer span is being forced to a Server kind due to AWS requirements. Following that logic, it felt inconsistent to leave Internal spans handled differently, so I aligned it with the same approach.
return extracted_context | ||
|
||
@contextmanager | ||
def create_new_span_internal(span_name, request_id=None, parent_context=None, kind=trace.SpanKind.INTERNAL): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If keeping this as four methods, make it clear that _internal
is referring to the span and not that this is a way of marking this as an internal method (or perhaps someone more experience with Python would understand this immediately?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great question. From my perspective, when discussing OpenTelemetry and spans, it should be clear that we're referring to the type of the span, as that's its primary characteristic.
Given the context of OpenTelemetry, I believe it would be intuitive for someone familiar with the framework to understand that _internal
refers to the span type rather than indicating a private or internal method.
} | ||
|
||
current_span = trace.get_current_span() | ||
current_span.set_attribute("polytope.request.id", request.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we set this attribute in add_ and restore_? It seems like this should be done only in create_new_span as otherwise it seems that it will be unreliably set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are passing the context between services, therefore, each service starts a new span where we restore the context (so we keep track of the propagation) but the attributes are missing and need to be updated again
polytope_server/worker/worker.py
Outdated
else: | ||
request.user_message += "Success" | ||
# Creating new internal span for the datasource dispatch (calculate roundtrip time) | ||
with create_new_span_internal("Datatsource_{}".format(ds.get_type())) as span_ds: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we not add the request_id here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context has already been restored in this point. Since the context is tied to a specific request, restoring the context also sets the request ID into the parent span during the restoration process.
Minimum instrumentation for tracking the full request lifecycle