This README provides a detailed guide on the api.py
file, which serves as the API interface for the GraphRAG (Graph Retrieval-Augmented Generation) system. GraphRAG is a powerful tool that combines graph-based knowledge representation with retrieval-augmented generation techniques to provide context-aware responses to queries.
- Overview
- Setup
- API Endpoints
- Data Models
- Core Functionality
- Usage Examples
- Configuration
- Troubleshooting
The api.py
file implements a FastAPI-based server that provides various endpoints for interacting with the GraphRAG system. It supports different types of queries, including direct chat, GraphRAG-specific queries, DuckDuckGo searches, and a combined full-model search.
Key features:
- Multiple query types (local and global searches)
- Context caching for improved performance
- Background tasks for long-running operations
- Customizable settings through environment variables and config files
- Integration with external services (e.g., Ollama for LLM interactions)
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables: Create a
.env
file in theindexing
directory with the following variables:LLM_API_BASE=<your_llm_api_base_url> LLM_MODEL=<your_llm_model> LLM_PROVIDER=<llm_provider> EMBEDDINGS_API_BASE=<your_embeddings_api_base_url> EMBEDDINGS_MODEL=<your_embeddings_model> EMBEDDINGS_PROVIDER=<embeddings_provider> INPUT_DIR=./indexing/output ROOT_DIR=indexing API_PORT=8012
-
Run the API server:
python api.py --host 0.0.0.0 --port 8012
Main endpoint for chat completions. Supports different models:
direct-chat
: Direct interaction with the LLMgraphrag-local-search:latest
: Local search using GraphRAGgraphrag-global-search:latest
: Global search using GraphRAGduckduckgo-search:latest
: Web search using DuckDuckGofull-model:latest
: Combined search using all available models
Initiates prompt tuning process in the background.
Retrieves the status and logs of the prompt tuning process.
Starts the indexing process for GraphRAG in the background.
Retrieves the status and logs of the indexing process.
Health check endpoint.
Lists available models.
The API uses several Pydantic models for request and response handling:
Message
: Represents a chat message with role and content.QueryOptions
: Options for GraphRAG queries, including query type, preset, and community level.ChatCompletionRequest
: Request model for chat completions.ChatCompletionResponse
: Response model for chat completions.PromptTuneRequest
: Request model for prompt tuning.IndexingRequest
: Request model for indexing.
The load_context
function loads necessary data for GraphRAG queries, including entities, relationships, reports, text units, and covariates.
setup_search_engines
initializes both local and global search engines using the loaded context data.
Different query types are handled by separate functions:
run_direct_chat
: Sends queries directly to the LLM.run_graphrag_query
: Executes GraphRAG queries (local or global).run_duckduckgo_search
: Performs web searches using DuckDuckGo.run_full_model_search
: Combines results from all search types.
Long-running tasks like prompt tuning and indexing are executed as background tasks to prevent blocking the API.
import requests
url = "http://localhost:8012/v1/chat/completions"
payload = {
"model": "graphrag-local-search:latest",
"messages": [{"role": "user", "content": "What is GraphRAG?"}],
"query_options": {
"query_type": "local-search",
"selected_folder": "your_indexed_folder",
"community_level": 2,
"response_type": "Multiple Paragraphs"
}
}
response = requests.post(url, json=payload)
print(response.json())
import requests
url = "http://localhost:8012/v1/index"
payload = {
"llm_model": "your_llm_model",
"embed_model": "your_embed_model",
"root": "./indexing",
"verbose": True,
"emit": ["parquet", "csv"]
}
response = requests.post(url, json=payload)
print(response.json())
The API can be configured through:
- Environment variables
- A
config.yaml
file (path specified byGRAPHRAG_CONFIG
environment variable) - Command-line arguments when starting the server
Key configuration options:
llm_model
: The language model to useembedding_model
: The embedding model for vector representationscommunity_level
: Depth of community analysis in GraphRAGtoken_limit
: Maximum tokens for contextapi_key
: API key for LLM serviceapi_base
: Base URL for LLM APIapi_type
: Type of API (e.g., "openai")
- If you encounter connection errors with Ollama, ensure the service is running and accessible.
- For "context loading failed" errors, check that the indexed data is present in the specified output folder.
- If prompt tuning or indexing processes fail, review the logs using the respective status endpoints.
- For performance issues, consider adjusting the
community_level
andtoken_limit
settings.
For more detailed information on GraphRAG's indexing and querying processes, refer to the official GraphRAG documentation.