Using the RedisVL library, I'm building out a simple vector search demo. I'll be taking a dataset of anime from Kaggle and vectorizing both the main poster image and the description. I'll then be able to search over the dataset using text and see either the closest image or the closest description to the text I entered. As a nerd who loves anime, this is a fun project for me to work on.
If you want to run the search over anime posters, you'll need to first get the anime dataset for yourself from Kaggle - it requires is an email to download, but it's free.
You'll also need a Redis connection, with the two easiest options presented below.
For me and my environment, running this from scratch looks like:
git clone https://github.com/sav-norem/redis_vectorsearch.git
CD redis_vectorsearch
python3 -m venv .
source bin/activate
poetry install
- Put
anime-dataset-2023.csv
(the file you downloaded from Kaggle) in the same folder as thepoetry.lock
file python3 src/redisvl_demo/redisvl_demo.py
This will bring up a link to the local web app where you can now search using text over the top ~1,000 anime posters.
This project has a DataLoader
and a SearchUI
.
Optional arguments are:
-loadfile
- A different source for the data to be loaded from.
-limit
- A limit for how many items to be loaded.
-imagepath
- A folder for where the poster images will be stored.
-indexname
- The name of the index where data will be loaded and where the SearchUI will be looking.
-redisconnection
- The host and port for a Redis connection.
-noload
- An option to bypass loading the data and just run the SearchUI.
This demo takes a bit to load and has a print statement mostly for entertainment / progress purpose. If you'd rather stare at an empty terminal while data gets loaded, you're more than welcome to take out the print statement. Regardless, parsing this data, getting the images, vectorizing them and loading them, takes a bit of time.
The vector_extend
file overwrites the HuggingFace embed function from the RedisVL library to allow for images. I'm currently using two different models, one for the images and one for the synopsis. While sentence-transformers/clip-ViT-L-14
is multi-modal and can be used for text, the limit for tokens was too low to vectorize the entire synopsis. I'll definitely be exploring other models for these purposes and seeing how they impact the search results.