Skip to content

Latest commit

 

History

History
60 lines (49 loc) · 2.72 KB

README.md

File metadata and controls

60 lines (49 loc) · 2.72 KB

psi-king

framework for building multi-modal first document retriever

PSI King - King of the Senses from Psychonauts 2

Overview

Document / Node (TextNode, ImageNode, TableNode)

document

  • a Document contains a list of nodes (document.nodes)
  • each node can be one of the following types
    • TextNode
    • ImageNode
    • TableNode
  • schemas are defined here
    • detailed descriptions are available here

Pipeline Flow

Document Ingestion Flow example:

  • (Doc) Collection -> Extraction -> Transformation -> Index(?)
    • Extraction: read file into Document instance
    • Transformation: merging/chunking/filtering
    • Index: Embedding & inserting into searchable DB

nodes

Example

allganize-RAG-Evaluation-Dataset-KO PDF dataset

Pipeline Overview: 3_3_overview

Experiments

Korean sparse search with vectordb

  • pgvector docs experiments
    • use mecab-ko + textsearch_ko to enable korean tsvector calculation
  • qdrant docs experiments
    • build qdrant with cjk language support for korean tokenization

Acknowledgements

History of this framework's development is recorded below