This project focuses on two main objectives:
- Fine-tuning a Large Language Model (LLM) for summarizing legal documents.
- Implementing a Retrieval-Augmented Generation (RAG) pipeline to automatically retrieve and summarize legal documents based on user-provided details.
The project leverages the Llama-2-7b model, fine-tuned using LoRA (Low-Rank Adaptation), and integrates a RAG pipeline for efficient document retrieval and summarization.
- Project Overview
- Setup Instructions
- Dataset Preparation
- Fine-Tuning the LLM
- RAG Pipeline
- Running the Project
- Future Work
- References
- The project fine-tunes the Llama-2-7b model using LoRA to adapt it for summarizing legal documents.
- The model is trained on a dataset of legal judgments and their corresponding summaries.
- The training process uses the SFTTrainer from the
trl
library, which simplifies fine-tuning with LoRA.
- The RAG pipeline retrieves relevant legal documents based on user queries (e.g., case names or details).
- It uses FAISS for efficient similarity search and TF-IDF for document vectorization.
- The retrieved documents are then summarized using the fine-tuned LLM.
conda create --name legal_assistant python=3.10
conda activate legal_assistant
pip install -r requirements.txt
- Visit the Llama 2 hugging-face page and request access to the model.
- Once approved, log in to Hugging Face:
huggingface-cli login
pip install sentencepiece datasets trl bitsandbytes faiss-cpu
- Download the dataset from Zenodo.
- Extract the dataset into the
legal-llm-project/datasets
directory.
Run the preprocessing script to prepare the dataset for training:
python src/data_preprocessing.py
- LoRA is used to fine-tune the Llama-2-7b model with a low-rank adaptation approach.
- The configuration includes parameters like
lora_alpha
,lora_dropout
, andr
(rank).
- The
SFTTrainer
from thetrl
library is used for fine-tuning. - The dataset is formatted with clear distinctions between instructions, input, and response:
### Instruction: Summarize the following legal text. ### Input: {legal_text} ### Response: {summary}
After training, the fine-tuned model is saved for inference:
model.save_pretrained("../fine_tuned_lora_model")
tokenizer.save_pretrained("../fine_tuned_lora_model")
- The pipeline uses FAISS for efficient similarity search.
- Documents are vectorized using TF-IDF for retrieval.
- Retrieved documents are summarized using the fine-tuned LLM.
- The prompt format ensures the model knows where to start the response:
### Instruction: Summarize the following legal text. ### Input: {retrieved_document} ### Response: {generated_summary}
Run the fine-tuning script:
python src/fine_tune.py
Run the RAG pipeline for document retrieval and summarization:
python src/rag_pipeline.py
- Increase Token Limit: The current model supports up to 4096 tokens. Future work can explore extending this limit for longer documents.
- Expand to UK Dataset: Adapt the model for summarizing UK legal documents, which are typically larger and more complex.
- Optimize Retrieval: Improve the RAG pipeline for faster and more accurate document retrieval.
- Llama 2 Documentation
- LoRA Fine-Tuning with AMD ROCm
- SFTTrainer Documentation
- 4-bit Quantization with Bitsandbytes
- Fine-Tuning LLMs with Domain Knowledge
This project provides a robust framework for fine-tuning LLMs for legal document summarization and integrating them into a RAG pipeline for efficient retrieval and generation.