Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
apicast-policy.json		apicast-policy.json
custom_metrics.lua		custom_metrics.lua
init.lua		init.lua
llm.lua		llm.lua
portal_client.lua		portal_client.lua
response.lua		response.lua

README.md

APIcast LLM Metrics Policy

What is it?

Custom policy to monitor LLM model token usage using APIcast

How to use it?

Everything is included in a policy, no custom image is needed.

Simply add the custom policy as described in the Readme.

Configuration parameters:

SSE support: checked to enable token counts for Streaming responses.
Increment: {{llm_usage.total_tokens}}
Left: {{status}}
Left-type: Liquid
Right: 200
Right-type: Plain text
op: ==
Metric to increment: total_tokens

In the same policy, add the same rules for prompt_tokens and completion_tokens respectively.

How does it work?

This policy does a few things:

Expose AI model token usage via Prometheus metrics
Support sending custom metrics to 3scale
Handle Server Send Event (SSE) response

Not supported:

Gzip payload

Prometheus metrics

Once the policy is added to the product, it will display the following 3 metrics:

prompt_tokens
completions_token
total_tokens

Each metric is implemented as a counter and scoped by Product by default. The metrics will have the following format.

name: prompt_tokens labels(application_id, application_name, service_id, service_name) value=prompt_tokens

name: completion_tokens labels(application_id, application_name, service_id, service_name) value=completion_tokens

name: total_tokens labels(application_id, application_name, service_id, service_name) value=total_tokens

For example:

$ curl <APICast_IP>:9421/metrics

...
# HELP llm_completion_tokens_count Token count for a completion 
# TYPE llm_completion_tokens_count counter
llm_completion_tokens_count{service_id="1",service_system_name="",application_id="1",application_system_name="test"} 16
# HELP llm_prompt_tokens_count Token count for a prompt
# TYPE llm_prompt_tokens_count counter
llm_prompt_tokens_count{service_id="1",service_system_name="",application_id="1",application_system_name="test"} 24 
# HELP llm_total_token_count Total token count
# TYPE llm_total_token_count counter
llm_total_token_count{service_id="1",service_system_name="",application_id="1",application_system_name="test"} 40
...

Note that to extract usage, the response is fully buffered into memory and then decoded to a LUA object. So increase the memory request/limit if you start seeing performance issues.

Custom metrics

To avoid patching the existing Custom Metrics policy, in is included this LLM policy. The functionality and usage is identical to the built-in Custom Metrics policy, but there are a few additional fields exposed in the request context:

Token usage
Application details

NOTE: Metrics need to be created in the Admin Portal before the policy will push the metric values.

For example: the following configuration will set up a metric call total_tokens with increment based on the total_tokens used.

{
    "name": "apicast.policy.llm", 
    "configuration": {
    "rules": [{
        "condition": {
        "operations": [
            {"op": "==", "left": "{{status}}", "left_type": "liquid", "right": "200"}
        ],
        "combine_op": "and"
        },
        "metric": "total_tokens",
        "increment": "{{ llm_usage.total_tokens }}"
    }]
    }
}

Server-send events (SSE)

Streaming is a mode where a client can specify "stream": true in their request, and the LLM server will stream each piece of the response (usually token-by-token) as a server-sent event (SSE).

We need to do three things here:

Parse the incoming request and check whether stream: true is set
Detect the stream response based on the header text/event-stream
Parse the events and detect the usage token

SSE support is off by default, toggle the SSE support checkbox to enable

NOTE: Streamed responses will not be buffered, instead being streamed back to the client.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm_metrics_policy

llm_metrics_policy

README.md

APIcast LLM Metrics Policy

What is it?

How to use it?

How does it work?

Prometheus metrics

Custom metrics

Server-send events (SSE)

Files

llm_metrics_policy

Directory actions

More options

Directory actions

More options

Latest commit

History

llm_metrics_policy

Folders and files

parent directory

README.md

APIcast LLM Metrics Policy

What is it?

How to use it?

How does it work?

Prometheus metrics

Custom metrics

Server-send events (SSE)