Skip to content

Framework for building Multimodal Document Retrievers

Notifications You must be signed in to change notification settings

id4thomas/psi-king

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

psi-king

framework for building multi-modal first document retriever

PSI King - King of the Senses from Psychonauts 2

Overview

Document / Node (TextNode, ImageNode, TableNode)

document

  • a Document contains a list of nodes (document.nodes)
  • each node can be one of the following types
    • TextNode
    • ImageNode
    • TableNode
  • schemas are defined here
    • detailed descriptions are available here

Pipeline Flow

Document Ingestion Flow example:

  • (Doc) Collection -> Extraction -> Transformation -> Index(?)
    • Extraction: read file into Document instance
    • Transformation: merging/chunking/filtering
    • Index: Embedding & inserting into searchable DB

nodes

Example

allganize-RAG-Evaluation-Dataset-KO PDF dataset

Pipeline Overview: 3_3_overview

Experiments

Korean sparse search with vectordb

  • pgvector docs experiments
    • use mecab-ko + textsearch_ko to enable korean tsvector calculation
  • qdrant docs experiments
    • build qdrant with cjk language support for korean tokenization

Acknowledgements

History of this framework's development is recorded below

About

Framework for building Multimodal Document Retrievers

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published