This project showcases how you can work up from a simple RAG pipeline to a more complex setup that involves finetuning embeddings, reranking retrieved documents, and even finetuning the LLM itself. We'll do this all for a use case relevant to ZenML: a question answering system that can provide answers to common questions about ZenML. This will help you understand how to apply the concepts covered in this guide to your own projects.
Contained within this project is all the code needed to run the full pipelines. You can follow along in our guide to understand the decisions and tradeoffs behind the pipeline and step code contained here. You'll build a solid understanding of how to leverage LLMs in your MLOps workflows using ZenML, enabling you to build powerful, scalable, and maintainable LLM-powered applications.
This project contains all the pipeline and step code necessary to follow along with the guide. You'll need a PostgreSQL database to store the embeddings; full instructions are provided below for how to set that up.
We've recently been holding some webinars about this repository and project. Watche the videos below if you want an introduction and context around the code and ideas covered in this project.
This project showcases production-ready pipelines so we use some cloud infrastructure to manage the assets. You can run the pipelines locally using a local PostgreSQL database, but we encourage you to use a cloud database for production use cases.
Make sure you're running from a Python 3.8+ environment. Setup a virtual environment and install the dependencies using the following command:
pip install -r requirements.txt
In order to use the default LLM for this query, you'll need an account and an API key from OpenAI specified as another environment variable:
export OPENAI_API_KEY=<your-openai-api-key>
Supabase is a cloud provider that provides a PostgreSQL database. It's simple to use and has a free tier that should be sufficient for this project. Once you've created a Supabase account and organisation, you'll need to create a new project.
You'll want to save the Supabase database password as a ZenML secret so that it isn't stored in plaintext. You can do this by running the following command:
zenml secret create supabase_postgres_db --password="YOUR_PASSWORD"
You'll then want to connect to this database instance by getting the connection string from the Supabase dashboard.
You can use these details to populate some environment variables where the pipeline code expects them:
export ZENML_POSTGRES_USER=<your-supabase-user>
export ZENML_POSTGRES_HOST=<your-supabase-host>
export ZENML_POSTGRES_PORT=<your-supabase-port>
To run the pipeline, you can use the run.py
script. This script will allow you
to run the pipelines in the correct order. You can run the script with the
following command:
python run.py --rag
This will run the basic RAG pipeline, which scrapes the ZenML documentation and stores the embeddings in the Supabase database.
Once the pipeline has run successfully, you can query the assets in the Supabase
database using the --query
flag as well as passing in the model you'd like to
use for the LLM.
When you're ready to make the query, run the following command:
python run.py --query "how do I use a custom materializer inside my own zenml steps? i.e. how do I set it? inside the @step decorator?" --model=gpt4
Alternative options for LLMs to use include:
gpt4
gpt35
claude3
claudehaiku
Note that Claude will require a different API key from Anthropic. See the
litellm
docs on how to set
this up.
To run the evaluation pipeline, you can use the following command:
python run.py --evaluation
You'll need to have first run the RAG pipeline to have the necessary assets in the database to evaluate.
For embeddings finetuning we first generate synthetic data and then finetune the embeddings. Both of these pipelines are described in the LLMOps guide and instructions for how to run them are provided below.
To run the distilabel
synthetic data generation pipeline, you can use the following commands:
pip install -r requirements-argilla.txt # special requirements
python run.py --synthetic
You will also need to have set up and connected to an Argilla instance for this to work. Please follow the instructions in the Argilla documentation to set up and connect to an Argilla instance on the Hugging Face Hub. ZenML's Argilla integration documentation will guide you through the process of connecting to your instance as a stack component.
To run the pipeline for finetuning the embeddings, you can use the following commands:
pip install -r requirements-argilla.txt # special requirements
python run.py --embeddings
As with the previous pipeline, you will need to have set up and connected to an Argilla instance for this to work. Please follow the instructions in the Argilla documentation to set up and connect to an Argilla instance on the Hugging Face Hub. ZenML's Argilla integration documentation will guide you through the process of connecting to your instance as a stack component.
Credit to Phil Schmid for his tutorial on embeddings finetuning with Matryoshka loss function which we adapted for this project.
The basic RAG pipeline will run using a local stack, but if you want to improve the speed of the embeddings step you might want to consider using a cloud orchestrator. Please follow the instructions in documentation on popular integrations (currently available for AWS and GCP) to learn how you can run the pipelines on a remote stack.
If you run the pipeline using a cloud artifact store, logs from all the steps as well as assets like the visualizations will all be shown in the ZenML dashboard.
If you run the pipeline using ZenML Pro you'll have access to the managed dashboard which will allow you to get started quickly. We offer a free trial so you can try out the platform without any cost. Visit the ZenML Pro dashboard to get started.
You can also self-host the ZenML dashboard. Instructions are available in our documentation.
The project loosely follows the recommended ZenML project structure:
.
├── LICENSE # License file
├── README.md # Project documentation
├── __init__.py
├── constants.py # Constants used throughout the project
├── materializers
│ ├── __init__.py
│ └── document_materializer.py # Document materialization logic
├── most_basic_eval.py # Basic evaluation script
├── most_basic_rag_pipeline.py # Basic RAG pipeline script
├── notebooks
│ └── visualise_embeddings.ipynb # Notebook to visualize embeddings
├── pipelines
│ ├── __init__.py
│ ├── generate_chunk_questions.py # Pipeline to generate chunk questions
│ ├── llm_basic_rag.py # Basic RAG pipeline using LLM
│ └── llm_eval.py # Pipeline for LLM evaluation
├── requirements.txt # Project dependencies
├── run.py # Main script to run the project
├── steps
│ ├── __init__.py
│ ├── eval_e2e.py # End-to-end evaluation step
│ ├── eval_retrieval.py # Retrieval evaluation step
│ ├── eval_visualisation.py # Evaluation visualization step
│ ├── populate_index.py # Step to populate the index
│ ├── synthetic_data.py # Step to generate synthetic data
│ ├── url_scraper.py # Step to scrape URLs
│ ├── url_scraping_utils.py # Utilities for URL scraping
│ └── web_url_loader.py # Step to load web URLs
├── structures.py # Data structures used in the project
├── tests
│ ├── __init__.py
│ └── test_url_scraping_utils.py # Tests for URL scraping utilities
└── utils
├── __init__.py
└── llm_utils.py # Utilities related to the LLM
The RAG pipeline relies on code from this Timescale blog that showcased using PostgreSQL as a vector database. We adapted it for our use case and adapted it to work with Supabase.