Skip to content

chore(docs): Add an Async Explainer doc #200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
38 changes: 38 additions & 0 deletions docs/v2_design_overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Design Overview for `PGVectorStore`

This document outlines the design choices behind the PGVectorStore integration for LangChain, focusing on how an async PostgreSQL driver can supports both synchronous and asynchronous usage.

## Motivation: Performance through Asynchronicity

Database interactions are often I/O-bound, making asynchronous programming crucial for performance.

- **Non-Blocking Operations:** Asynchronous code prevents the application from stalling while waiting for database responses, improving throughput and responsiveness.
- **Asynchronous Foundation (`asyncio` and Drivers):** Built upon Python's `asyncio` library, the integration is designed to work with asynchronous PostgreSQL drivers to handle database operations efficiently. While compatible drivers are supported, the `asyncpg` driver is specifically recommended due to its high performance in concurrent scenarios. You can explore its benefits ([link](https://magic.io/blog/asyncpg-1m-rows-from-postgres-to-python/)) and performance benchmarks ([link](https://fernandoarteaga.dev/blog/psycopg-vs-asyncpg/)) for more details.

This native async foundation ensures the core database interactions are fast and scalable.

## The Two-Class Approach: Enabling a Mixed Interface

To cater to different application architectures while maintaining performance, we provide two classes:

1. **`AsyncPGVectorStore` (Core Asynchronous Implementation):**
* This class contains the pure `async/await` logic for all database operations.
* It's designed for **direct use within asynchronous applications**. Users working in an `asyncio` environment can `await` its methods for maximum efficiency and direct control within the event loop.
* It represents the fundamental, non-blocking way of interacting with the database.

2. **`PGVectorStore` (Mixed Sync/Async API ):**
* This class provides both asynchronous & synchronous APIs.
* When one of its methods is called, it internally invokes the corresponding `async` method from `AsyncPGVectorStore`.
* It **manages the execution of this underlying asynchronous logic**, handling the necessary `asyncio` event loop interactions (e.g., starting/running the coroutine) behind the scenes.
* This allows users of synchronous codebases to leverage the performance benefits of the asynchronous core without needing to rewrite their application structure.

## Benefits of this Dual Interface Design

This two-class structure provides significant advantages:

- **Interface Flexibility:** Developers can **choose the interface that best fits their needs**:
* Use `PGVectorStore` for easy integration into existing synchronous applications.
* Use `AsyncPGVectorStore` for optimal performance and integration within `asyncio`-based applications.
- **Ease of Use:** `PGVectorStore` offers a familiar synchronous programming model, hiding the complexity of managing async execution from the end-user.
- **Robustness:** The clear separation helps prevent common errors associated with mixing synchronous and asynchronous code incorrectly, such as blocking the event loop from synchronous calls within an async context.
- **Efficiency for Async Users:** `AsyncPGVectorStore` provides a direct path for async applications, avoiding any potential overhead from the sync-to-async bridging layer present in `PGVectorStore`.