-
Notifications
You must be signed in to change notification settings - Fork 383
RAG Guide
From Files
> Add documents: /tmp/dir1/file1;/tmp/dir1/file2
Loading /tmp/dir1/file1 [1/2]
Loading /tmp/dir1/file2 [2/2]
From Directory
> Add documents: /tmp/dir1/
Load /tmp/dir1/ [1/1]
π Loading file /tmp/dir1/file1
π Loading file /tmp/dir1/file2
π Loading file /tmp/dir1/file3
β¨ Load directory completed
From Directory (with file extensions filter)
> Add documents: /tmp/dir2/**/*.{md,txt}
Load /tmp/dir2/**/*.{md,txt} [1/1]
π Loading file /tmp/dir2/file2.md
π Loading file /tmp/dir2/file1.txt
β¨ Load directory completed
AIChat RAG doesn't support glob; it only uses glob syntax for filtering file extensions.
From Url
> Add documents: https://sigoden.github.io/mynotes/tools/linux.html
Load https://sigoden.github.io/mynotes/tools/linux.html [1/1]
From RecursiveUrl (websites)
> Add documents: https://sigoden.github.io/mynotes/tools/**
Load https://sigoden.github.io/mynotes/tools/** [1/1]
βοΈ maxConnections=5 exclude='' extract='' toMarkdown=true
π Crawling https://sigoden.github.io/mynotes/tools/
π Crawling https://sigoden.github.io/mynotes/tools/docker.html
π Crawling https://sigoden.github.io/mynotes/tools/git.html
π Crawling https://sigoden.github.io/mynotes/tools/github-ci.html
π Crawling https://sigoden.github.io/mynotes/tools/linux.html
π Crawling https://sigoden.github.io/mynotes/tools/redis.html
β¨ Crawl completed
**
is used to distinguish between Url and RecursiveUrl
By default, AICHAT can only process text files. We need to configure the document_loaders
so AICHAT can handle binary files such as PDFs and DOCXs.
# Define document loaders to control how RAG and `.file`/`--file` load files of specific formats.
document_loaders:
# You can add custom loaders using the following syntax:
# <file-extension>: <command-to-load-the-file>
# Note: Use `$1` for input file and `$2` for output file. If `$2` is omitted, use stdout as output.
pdf: 'pdftotext $1 -' # Load .pdf file, see https://poppler.freedesktop.org
docx: 'pandoc --to plain $1' # Load .docx file
# xlsx: 'ssconvert $1 $2' # Load .xlsx file
# html: 'pandoc --to plain $1' # Load .html file
recursive_url: 'rag-crawler $1 $2' # Load websites, see https://github.com/sigoden/rag-crawler
The document_loaders
configuration item is a map where the key represents the file extension and the value specifies the corresponding loader command.
AIChat provides default loaders for pdf
, docx
and recursive_url
.
To ensure the loaders function correctly, please verify that the required tools are installed.
AIChat RAG defaults to the reciprocal_rank_fusion
algorithm for merging vector and keyword search results.
However, using a reranker to combine these results is a more established method that can yield greater relevance and accuracy.
You can add the following configuration to specify the default reranker.
rag_reranker_model: null # Specifies the rerank model to use
You can also dynamically adjust the reranker using the .set
command.
.set rag_reranker_model <tab>