People Researcher Agent searches the web for information about a user-supplied person and returns it in a structured format defined by user-supplied JSON schema.
Set API keys for the LLM of choice (Anthropic is set by default set in src/agent/graph.py
) and Tavily API:
cp .env.example .env
Clone the repository and launch the assistant using the LangGraph server:
curl -LsSf https://astral.sh/uv/install.sh | sh
git clone https://github.com/langchain-ai/people-researcher.git
cd people-researcher
uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.11 langgraph dev
People Researcher Agent follows a multi-step research and extraction workflow that separates web research from schema extraction, allowing for better resource management and comprehensive data collection:
- Research Phase: The system performs intelligent web research on the input person:
- Uses an LLM to generate targeted search queries based on the schema requirements (up to
max_search_queries
) - Executes concurrent web searches via Tavily API, retrieving up to
max_search_results
results per query - Takes structured research notes focused on schema-relevant information
- Uses an LLM to generate targeted search queries based on the schema requirements (up to
- Extraction Phase: After research is complete, the system:
- Consolidates all research notes
- Uses an LLM to extract and format the information according to the user-defined schema
- Returns the structured data in the exact format requested
- Reflection Phase: The system evaluates the quality of extracted information:
- Analyzes completeness of required fields
- Identifies any missing or incomplete information
- Generates targeted follow-up search queries if needed
- Continues research until information is satisfactory or max reflection steps reached
The configuration for People Researcher Agent is defined in the src/agent/configuration.py
file:
max_search_queries
: int = 3 # Max search queriesmax_search_results
: int = 3 # Max search results per querymax_reflection_steps
: int = 1 # Max reflection steps
The user inputs are:
* person: a JSON object with the following fields:
- `email` (required): The email of the person
- `name` (optional): The name of the person
- `company` (optional): The current company of the person
- `linkedin` (optional): The Linkedin URL of the person
- `role` (optional): The current title of the person
* schema: Optional[dict] - A JSON schema for the output
* user_notes: Optional[str] - Any additional notes about the people from the user
If a schema is not provided, the system will use a default schema (DEFAULT_EXTRACTION_SCHEMA
) defined in people_maistro.py
.
⚠️ WARNING: JSON schemas requiretitle
anddescription
fields for extraction.⚠️ WARNING: Avoid JSON objects with nesting; LLMs have challenges performing structured extraction from nested objects.
Here is an example schema that can be supplied to research a person:
{
"type": "object",
"required": [
"years_experience",
"current_company",
"role",
"prior_companies"
],
"properties": {
"role": {
"type": "string",
"description": "Current role of the person."
},
"years_experience": {
"type": "number",
"description": "How many years of full time work experience (excluding internships) does this person have."
},
"current_company": {
"type": "string",
"description": "The name of the current company the person works at."
},
"prior_companies": {
"type": "array",
"items": {
"type": "string"
},
"description": "List of previous companies where the person has worked"
}
},
"description": "Person information",
"title": "Person"
}
Prior to engaging in any optimization, it is important to establish a baseline performance. This repository includes:
- A dataset consisting of a list of people and the expected structured information to be extracted for each person.
- An evaluation script that can be used to evaluate the agent on this dataset.
Make sure you have the LangSmith CLI installed:
pip install langsmith
And set your API key:
export LANGSMITH_API_KEY=<your_langsmith_api_key>
export ANTHROPIC_API_KEY=<your_anthropic_api_key>
A score between 0 and 1 is assigned to each extraction result by an LLM model that acts as a judge.
The model assigns the score based on how closely the extracted information matches the expected information.
Create a new dataset in LangSmith using the code in the eval
folder:
python eval/create_dataset.py
To run the evaluation, you can use the run_eval.py
script in the eval
folder. This will create a new experiment in LangSmith for the dataset you created in the previous step.
python eval/run_eval.py --experiment-prefix "My custom prefix" --agent-url http://localhost:2024