Indexing overview #1738

collimarco · 2022-07-07T10:06:38Z

collimarco
Jul 7, 2022

I am looking at "Concepts" in the Documentation.

There is a part that I cannot find: what are the files stored in S3? What is the file structure of indexes and reverse indexes on S3? What is the content / structure of each file?

It would be really useful in order to better understand this project.

Currently the documentation suggests that there is split pruning based on timestamp and tags, but then it doesn't explain anything about full text search, aggregations, etc. It would be interesting to see the steps involved or some diagrams.

fmassot · 2022-07-12T00:25:35Z

fmassot
Jul 12, 2022
Maintainer

Hi @collimarco, good points.

To understand full text search a the low level, Quickwit uses tantivy to build the indexes. You will find in the repository all the information about index files / how it works. This video is explaining a lot of things too.

Quickwit is built on top of tantivy, a Quickwit index is made of what we call splits (see format), they are just made of tantivy files + cache. Moreover, Quickwit stores split metadata in the metastore, in the metadata we store start/end timestamp of the split, also tags so that we do split pruning.

For quickwit architecture, you can have a look at this video: https://www.youtube.com/watch?v=3Y1RX6c0McU

For aggregation, if you want to learn the internals, you need to look at tantivy code, we don't have yet some in-depth documentation or video about how it works. The rust docs is nice though: https://docs.rs/tantivy/latest/x86_64-pc-windows-msvc/tantivy/aggregation/index.html

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing overview #1738

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Indexing overview #1738

collimarco Jul 7, 2022

Replies: 1 comment

fmassot Jul 12, 2022 Maintainer

collimarco
Jul 7, 2022

fmassot
Jul 12, 2022
Maintainer