Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tool query example #189

Merged
merged 6 commits into from
Apr 26, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 92 additions & 0 deletions docs/tracing/faq/querying_traces.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -502,3 +502,95 @@ This type of query is useful if you want to extract a specific run conditional o
]}
groupId="client-language"
/>

## Advanced Examples

#### Exporting flattened trace view with child tool usage

The following Python example demonstrates how to export a flattened view of traces, including information on the tools (from nested runs) used by the agent within each trace.
This can be used to analyze the behavior of your agents across multiple traces.

This example queries all tool runs within a specified number of days and groups them by their parent (root) run ID. It then fetches the relevant information for each root run, such as the run name, inputs, outputs, and combines that information with the child run information.

To optimize the query, the example:

1. Selects only the necessary fields when querying tool runs to reduce query time.
2. Fetches root runs in batches while processing tool runs concurrently.

<CodeTabs
tabs={[
PythonBlock(`from collections import defaultdict
from concurrent.futures import Future, ThreadPoolExecutor
from datetime import datetime, timedelta\n
import pandas as pd
from langsmith import Client
from tqdm.auto import tqdm\n
client = Client()
project_name = "my-project"
num_days = 30\n
# List all tool runs
tool_runs = client.list_runs(
project_name=project_name,
start_time=datetime.now() - timedelta(days=num_days),
run_type="tool",
# We don't need to fetch inputs, outputs, and other values that # may increase the query time
# highlight-next-line
select=["trace_id", "name", "run_type"],
)\n
data = []
futures: list[Future] = []
trace_cursor = 0
trace_batch_size = 50\n
tool_runs_by_parent = defaultdict(lambda: defaultdict(set))
# Do not exceed rate limit
# highlight-next-line
with ThreadPoolExecutor(max_workers=2) as executor:
# Group tool runs by parent run ID
for run in tqdm(tool_runs):
# Collect all tools invoked within a given trace
tool_runs_by_parent[run.trace_id]["tools_involved"].add(run.name)
# maybe send a batch of parent run IDs to the server
# this lets us query for the root runs in batches
# while still processing the tool runs
if len(tool_runs_by_parent) % trace_batch_size == 0:
if this_batch := list(tool_runs_by_parent.keys())[
trace_cursor : trace_cursor + trace_batch_size
]:
trace_cursor += trace_batch_size
futures.append(
executor.submit(
client.list_runs,
project_name=project_name,
run_ids=this_batch,
select=["name", "inputs", "outputs", "run_type"],
)
)
if this_batch := list(tool_runs_by_parent.keys())[trace_cursor:]:
futures.append(
executor.submit(
client.list_runs,
project_name=project_name,
run_ids=this_batch,
select=["name", "inputs", "outputs", "run_type"],
)
)\n
for future in tqdm(futures):
root_runs = future.result()
for root_run in root_runs:
root_data = tool_runs_by_parent[root_run.id]
data.append(
{
"run_id": root_run.id,
"run_name": root_run.name,
"run_type": root_run.run_type,
"inputs": root_run.inputs,
"outputs": root_run.outputs,
"tools_involved": list(root_data["tools_involved"]),
}
)\n
# Convert to a pandas DataFrame\n
df = pd.DataFrame(data)
df.head()`),
]}
groupId="client-language"
/>
Loading