Feature request: Performance metrics per model-version #1970
Labels
stale
This label marks the issue/pr stale - to be closed automatically if no activity
stat:awaiting response
type:feature
Feature Request
Describe the problem the feature is intended to solve
We have multiple AB running, where the same model_name can have different versions,
which could have different performance outcomes.
For instance, the same model with the same inputs can have different number of layers or
architecture, which can make it slower and heavier especially for "on CPU" processing.
We need to have a way to monitor and get perf. metrics such as model p95 and average latency,
at model_name.version granularity, while currently, all that is visible is model_name level metrics.
:tensorflow:serving:request_latency_bucket{model_name="tf_model_name",API="Predict",entrypoint="GRPC",le="2.52873e+08"} 16237
Describe the solution
Solution to this is to have one more set of performance counters inside servable, to count p95 and average
time at more granular level of model-version.
Describe alternatives you've considered
Only way we can see right now, is to execute a call from the client and measure latency this way,
however that includes round trip latency and feature engineering requirements, that are specific to a given
model-version, thus making it operationally challenging at scale and maintenance headache, while still
not giving us pure server side metrics per model-version.
System information
The text was updated successfully, but these errors were encountered: