Folder Structure:
├── base/
│ ├── component.py
│ ├── constants.py
│ └── schema.py
├── embedder/
│ ├── colpali/
│ ├── fastembed/
│ └── openai/
├── formatter/
│ ├── document
│ └── node
├── processor/
│ ├── document
│ └── node
├── reader/
│ ├── pdf/
├── splitter
│ ├── image/
│ └── text/
└── storage
├── collectionstore/
├── docstore/
└── vectorstore/
- read file and make a Document instance
file path / file
-> `Document
- process documents (ex. merge nodes, preprocess)
Document
->List[Document]
- split document into chunks
Document
->List[Document]
- format chunks into contextual chunks (metadata enriched)
Document (chunk)
->str / Image.Image
- embed formatted contents
str / Image.Image
-> embedding- intentionally don't receive Document/Node as input, embedder should work as it's own without dependency to psiking schemas
- store document /chunks (
Document
s)
- store chunk & embedding (
Document
& embeddings)