Indexing overview #1738
Replies: 1 comment
-
Hi @collimarco, good points. To understand full text search a the low level, Quickwit uses tantivy to build the indexes. You will find in the repository all the information about index files / how it works. This video is explaining a lot of things too. Quickwit is built on top of tantivy, a Quickwit index is made of what we call splits (see format), they are just made of tantivy files + cache. Moreover, Quickwit stores split metadata in the metastore, in the metadata we store start/end timestamp of the split, also tags so that we do split pruning. For quickwit architecture, you can have a look at this video: https://www.youtube.com/watch?v=3Y1RX6c0McU For aggregation, if you want to learn the internals, you need to look at tantivy code, we don't have yet some in-depth documentation or video about how it works. The rust docs is nice though: https://docs.rs/tantivy/latest/x86_64-pc-windows-msvc/tantivy/aggregation/index.html |
Beta Was this translation helpful? Give feedback.
-
I am looking at "Concepts" in the Documentation.
There is a part that I cannot find: what are the files stored in S3? What is the file structure of indexes and reverse indexes on S3? What is the content / structure of each file?
It would be really useful in order to better understand this project.
Currently the documentation suggests that there is split pruning based on timestamp and tags, but then it doesn't explain anything about full text search, aggregations, etc. It would be interesting to see the steps involved or some diagrams.
Beta Was this translation helpful? Give feedback.
All reactions