Arcmind Vector DB is a high-performance, flexible, and ergonomic vector similarity search database for the Internet Computer. It is designed to be a general-purpose vector similarity search database that can be used for a wide range of AI-powered applications, including recommendation systems, search engines, Retrieval Augmented Generation (RAG), and long-term memory of Autonomous AI agents like ArcMind AI.
- Install Rust Toolchain using Rustup
Follows https://www.rust-lang.org/tools/install - Install cargo-audit
cargo install cargo-audit
- Install dfx sdk
Follow https://github.com/dfinity/sdk
If you want to test your project locally, you can use the following commands:
# Starts the replica, running in the background
dfx start --background
# Deploys controller and brain canisters to the local replica
# Setup the environment variable: CONTROLLER_PRINCIPAL using using > dfx identity get-principal
./scripts/provision.sh
The provision script will deploy a arcmindvectordb
canister.
See Candid for the full API.
Sample shell scripts are provided to interact with the canisters in the interact directory. Sample embeddings content and their embedding vectors are provided in the embeddings directory.
Open and Edit:
./interact/add_vector.sh
Try adding multiple vectors of different topics to the VectorStore.
Then search for similar vectors by using one of the vectors you added as input. It should return the same vector as the most similar vector and other similar vectors of the same topic. See how it can understand the semantic meanings of the vectors with many dimensions.
Open and Edit:
./interact/search_vector.sh
Note that the same embedding model must be used for adding and searching vectors. It is recommended that you use the same embedding model in a single VectorStore for consistent results.
The embeddings in /embeddings/ are generated using the OpenAI text-embedding-ada-002 model with its Embedding API
Get the string using commands below then put it into Github Secrets. Note: Replace default by the identity name you need.
awk 'NF {sub(/\r/, ""); printf "%s\\r\\n",$0;}' ~/.config/dfx/identity/default/identity.pem
cat ~/.config/dfx/identity/default/wallets.json
- Backend - Research and implement primary canister as long-term VectorStore with Nearest Neighbours distance metric, embedding API and indexing
- Backend - Integrate with ArcMind AI Autonomous Agent for long-term memory
- Doc - Add documentation for the VectorStore API
- Backend - Self-hosted machine learning models for generating text (NLP), image and audio embeddings
- Backend - Scalable storage buckets for large-scale vector data beyond the canister storage limit
See the License file for license rights and limitations (MIT).
See CONTRIBUTING.md for details about how to contribute to this project.
Code & Architecture: Henry Chan, [email protected], Twitter: @kinwo
- Internet Computer
- Cloudflare - What is a Vector Database?
- RAG
- Open-source vector similarity search for Postgres
- Spotify Annoy Library - Approximate Nearest Neighbors in C++/Python
- What is similarity Search
- Semantic Search: Measuring Meaning From Jaccard to Bert
- A high-performance, flexible, ergonomic k-d tree Rust library
- K-d tree
- Depplearing.ai course - Building Applications with Vector Databases