CodeBaseRag: A Local RAG System for Multi-Language Codebases

Below is your README.md with improved formatting and consistent indentation:

CodeBaseRag: A Local RAG System for Multi-Language Codebases

CodeBaseRag is a local Recency-Based Aggregation (RAG) system designed to work with codebases written in Python, R, and JavaScript. It integrates with Qdrant for vector search and supports a variety of document formats, including PDF, HTML, Markdown, and JSON.

Features

Multi-Language Support: Works with Python, R, and JavaScript codebases.
Document Processing: Supports formats such as PDF, HTML, Markdown, JSON, and more.
Qdrant Integration: Utilizes Qdrant for efficient vector search and similarity-based retrieval.
Local LLMs: Leverages pre-trained language models through LangChain and Transformers.
User Interfaces: Choose between an interactive CLI and a Gradio-based GUI.

Installation

Prerequisites

Python 3.8 or higher
Docker installed on your system

Setup Steps

Clone the Repository

git clone https://github.com/jefftam1234/CodeBaseRag.git
cd CodeBaseRag

Install the Package
- To install locally:
```
pip install .
```
Create and Edit Configuration Files
- Copy the template configuration file:
```
cp config.template.ini config.ini
```
- Edit config.ini with your settings and place it in the active directory.
Configure Settings
- Qdrant Settings: In config qdrant.ini, specify your Qdrant host, port, collection names, and vector dimensions.
- Local Paths: Adjust paths such as DEFAULT_CODEBASE_PATH and DEFAULT_QDRANT_STORAGE_FOLDER to match your environment.
Run the Main Menu
```
codebaserag-menu
```
The main menu provides five options:
1. Convert files to text and perform chunking/splitting.
2. Ensure Docker is installed to run Qdrant (the vectorized database). Press “l” to launch and “k” to kill.
3. Push the chunked files to Qdrant.
4. In the GUI, choose between a command-line and a graphical interface. The GUI lets you select your installed LLM and the collection (the pushed code base).
5. Load any available configuration files; launching the main code again will use the selected configuration.

Running the Application

CLI Interface

Launch the CLI by running:
```
codebaserag
```
Enter your queries one at a time. Use /exit to return to the main menu.
Each query triggers a vector search in Qdrant and interacts with the configured local LLM.

GUI Interface (Gradio)

Launch the Gradio-based GUI according to your configuration.

Configuration Template

Below is an excerpt from the config.template.ini to help you get started:

[DEFAULT]
# Qdrant settings
DEFAULT_QDRANT_HOST = localhost
DEFAULT_QDRANT_PORT = 6333

# Collection name for Qdrant (required)
DEFAULT_COLLECTION_NAME = your_code_base

# Paths (required)
DEFAULT_CODEBASE_PATH = /home/your_code_base
# These paths are computed at runtime:
# DEFAULT_CONVERTED_PATH = <computed at runtime>
# DEFAULT_DOCS_PICKLE = <computed at runtime>
# DEFAULT_CHUNKS_PICKLE = <computed at runtime>

# Qdrant storage folder (required)
DEFAULT_QDRANT_STORAGE_FOLDER = /home/qdrant_storage
# DEFAULT_CONTAINER_ID_FILE is computed at runtime.

# LLM model: Set your default LLM model here.
DEFAULT_LLM_MODEL = your_llm:latest
# Alternative models:
# DEFAULT_LLM_MODEL = deepseek-r1:latest
# DEFAULT_LLM_MODEL = codellama:7b

# New parameters for splitting and retrieval:
LANGUAGE_AWARE_SPLITTING = True
CODEBASE_LANGUAGES = cpp  # Specify languages in a comma-separated list
CHUNK_SIZE = 1500
CHUNK_OVERLAP = 150
RETRIEVER_K = 10

# Gradio settings
DEFAULT_GRADIO_SHARE = False
DEFAULT_GRADIO_SERVER_NAME = 0.0.0.0
DEFAULT_GRADIO_SERVER_PORT = 7860

Fine-Tuning Based on Your Use Case

If your code base consists of many small functions/classes, you may want:
- CHUNK_SIZE: ~1200
- CHUNK_OVERLAP: ~200
- RETRIEVER_K: 8
If your code base has large, complex functions with interdependencies:
- CHUNK_SIZE: ~1800
- CHUNK_OVERLAP: ~250
- RETRIEVER_K: 12-15

Maintain Requirements

Install and run pipreqs to generate or update your requirements file:
```
pip install pipreqs
pipreqs . --force
```

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
user_interface		user_interface
.gitignore		.gitignore
README.md		README.md
config.template.ini		config.template.ini
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeBaseRag: A Local RAG System for Multi-Language Codebases

Features

Installation

Prerequisites

Setup Steps

Running the Application

CLI Interface

GUI Interface (Gradio)

Configuration Template

Fine-Tuning Based on Your Use Case

Maintain Requirements

About

Releases

Packages

Languages

jefftam1234/CodeBaseRag

Folders and files

Latest commit

History

Repository files navigation

CodeBaseRag: A Local RAG System for Multi-Language Codebases

Features

Installation

Prerequisites

Setup Steps

Running the Application

CLI Interface

GUI Interface (Gradio)

Configuration Template

Fine-Tuning Based on Your Use Case

Maintain Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages