Simple Vector Store (SVS) is a lightweight, simple vector store for your files. Simply point it at a directory, and it will create a vectorized version of that directory in the store of your choice, which can be easily searched semantically. Manage multiple stores, and use the REST API feature to connect your stores with other systems. Perfect for simple RAG Chatbots, knowledge base search, and plenty of other use-cases.
The synchronization feature allows you to easily maintain a vector representation of your directory over time, as it evolves. The SVS automatically detects differences between your source material and the vector store, and updates itself accordingly so that you never have to rebuild from scratch.
This repository is in its early stages, so don't hesitate to reach out and/or report bugs. I wan't this repository to be simple but reliable. Therefore, new bug reports are encouraged and will be addressed if reproduced. New features will be considered, but they won't be added unless they fit the scope of the project. If you still think the feature should be developed, feel free to fork the repository and let us know about it here.
If you would like to contribute, go ahead and reach out to me, you can find my contact info on my website.
Check out my little demo video if you're interested in seeing SVS in action.
The SVS provides multiple utilities to suit your vector store needs.
- Vectorize a directory as a store
- Manage multiple stores
- Synchronize stores with their source
- Easily query a store
- REST API script
- Increased store management features
- Additional redundancy and edge-case checks
- Additional indexing features, such as an "ignore" configuration, and easy control over indexable file types
These aren't "planned" per-se, but if people show interest then I'd be willing to build them:
- Support for multiple embeddings models
The simplest way to get started is to run the following command:
Macos/Linux
curl -s https://raw.githubusercontent.com/AidanTilgner/Simple-Vector-Store/refs/heads/trunk/scripts/setup.sh | bash
Windows
curl -s https://raw.githubusercontent.com/AidanTilgner/Simple-Vector-Store/refs/heads/trunk/scripts/setup.ps1 | pwsh
Alternatively, you can follow these steps to get setup:
git clone https://github.com/AidanTilgner/Simple-Vector-Store.git
Then, navigate to the new directory:
cd Simple-Vector-Store
Then, install requirements with pip. I'd also recommend using a virtual environment, with at least Python 3.9:
pip install -r requirements.txt
Then, copy the .env.example
file into a new file called .env
:
Macos/Linux
cp .env.example .env
Windows
copy .env.example .env
And make sure to fill out the fields in the new .env file, especially your OPENAI_API_KEY
for embeddings.
If you want a file or directory to remain private, and not be indexed, put it inside of a _private
directory at the root of your source. Anything under a _private
directory will not be indexed. I plan to add support for ignore features in the future as well, which will aid in protecting certain files from being indexed, and allow for easier codebase analysis.
The script is fairly simple, and uses a click CLI to make things more intuitive. The CLI functionality all happens in the svs.py
script, so all the commands will start by running the svs.py
:
python svs.py <command> <subcommand> [...ARGUMENTS] [...OPTIONS]
You can always alias this as well, however that's up to you unless people want it to be a part of the program directly.
Creating a store
is essentially creating a vectorization of a certain location. You can create a store like so:
python svs.py stores add <store_name> <location_of_directory>
Here you'll want to replace <store_name>
with the name you want to give your store, and <location_of_direcory>
with the directory of files that you want to vectorize. Now that you have a store, verify that it's there with the stores get
command:
python svs.py stores get
The result should be a list of stores. Now, you can build that store by running the build command:
python svs.py store build <store_name>
You should see a progress bar, and if everything is set up correctly, this should work accross the board. Keep in mind, this step uses the OpenAI Embeddings, and you may see issues related to them.
Now that you've built your store, you can search it like so:
python svs.py store search <store_name> <query>
Where your <query>
is the text string that you want to find related documents to. The result of this command should be a list of results with the beginnings of each document.
Now, say you've made some changes to your knowledge base or directory which was previously vectorized, and you'd like to see those changes reflected. Simply run the store sync
command to update your store without having to rebuild from scratch:
python svs.py store sync <store_name>
This will find all the differences between your current store and the source directory, and update the store accordingly.
There are two base commands as of now, each with a few subcommands:
stores
: manage the different storesadd <name> <path_to_directory>
: add a new store, of a given name and pathget [name]
: get all the stores, or if you provide the optional--name
flag, you can fetch information about an individual storereset
: | DANGEROUS | This will reset your entire datastore to its initial state
store
: work with an individual storebuild <name>
: builds the store based on the files in the given pathsearch <name> <query> [column (title | content)]
: performs semantic search on the given store, add a--column
flag with either "title" or "content" to search the respective columnsync <name>
: run synchronization for a given store, any changes made to the source will be reflected after synchronizationrename <name> <new_name>
: rename a store from one name to anotherremove <name>
: remove a given store from the datastore
Learn about how to use the REST API script here.
There are a few gotchas that you should be aware of.
You can learn more about OpenAI's rate-limiting on their site. These will apply since this script uses the OpenAI Embeddings API. The script has ways to account for this, such as the DEFAULT_FILE_LIMIT
and DEFAULT_DELAY_PER_REQUEST
environment variables, which are set up for the "free" tier of the OpenAI API.
If you fall under the free tier you may see fairly severe rate limits, such as only 200 requests per day and 3 per minute. Fortunately however, the bar for reaching tier 1 is fairly low (around 5 dollars paid), and the RPM and RPD increase substantially.