Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: adjust image size #1412

Merged
merged 1 commit into from
Dec 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ First, clustering data by column makes file scanning more efficient, especially

Second, data of the same column tends to be homogeneous which helps with compression when apply techniques like dictionary and Run-Length Encoding (RLE).

![Parquet file format](/parquet-file-format.png)
<img src="/parquet-file-format.png" alt="Parquet file format" width="500"/>

## Data Persistence

Expand All @@ -29,7 +29,7 @@ When the size of data buffered in MemTables reaches that threshold, GreptimeDB w

Apache Parquet file format provides inherent statistics in headers of column chunks and data pages, which are used for pruning and skipping.

![Column chunk header](/column-chunk-header.png)
<img src="/column-chunk-header.png" alt="Column chunk header" width="350"/>

For example, in the above Parquet file, if you want to filter rows where `name` = `Emily`, you can easily skip row group 0 because the max value for `name` field is `Charlie`. This statistical information reduces IO operations.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Parquet 具有层次结构,类似于“行组-列-数据页”。Parquet 文

其次,相同列的数据往往是同质的(比如具备近似的值),这有助于在采用字典和 Run-Length Encoding(RLE)等技术进行压缩。

![Parquet file format](/parquet-file-format.png)
<img src="/parquet-file-format.png" alt="Parquet file format" width="500"/>

## 数据持久化

Expand All @@ -28,7 +28,7 @@ GreptimeDB 提供了 `storage.flush.global_write_buffer_size` 的配置项来设

Apache Parquet 文件格式在列块和数据页的头部提供了内置的统计信息,用于剪枝和跳过。

![Column chunk header](/column-chunk-header.png)
<img src="/column-chunk-header.png" alt="Column chunk header" width="350"/>

例如,在上述 Parquet 文件中,如果你想要过滤 `name` 等于 `Emily` 的行,你可以轻松跳过行组 0,因为 `name` 字段的最大值是 `Charlie`。这些统计信息减少了 IO 操作。

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Parquet 具有层次结构,类似于“行组-列-数据页”。Parquet 文

其次,相同列的数据往往是同质的(比如具备近似的值),这有助于在采用字典和 Run-Length Encoding(RLE)等技术进行压缩。

![Parquet file format](/parquet-file-format.png)
<img src="/parquet-file-format.png" alt="Parquet file format" width="500"/>

## 数据持久化

Expand All @@ -28,7 +28,7 @@ GreptimeDB 提供了 `storage.flush.global_write_buffer_size` 的配置项来设

Apache Parquet 文件格式在列块和数据页的头部提供了内置的统计信息,用于剪枝和跳过。

![Column chunk header](/column-chunk-header.png)
<img src="/column-chunk-header.png" alt="Column chunk header" width="350"/>

例如,在上述 Parquet 文件中,如果你想要过滤 `name` 等于 `Emily` 的行,你可以轻松跳过行组 0,因为 `name` 字段的最大值是 `Charlie`。这些统计信息减少了 IO 操作。

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ First, clustering data by column makes file scanning more efficient, especially

Second, data of the same column tends to be homogeneous which helps with compression when apply techniques like dictionary and Run-Length Encoding (RLE).

![Parquet file format](/parquet-file-format.png)
<img src="/parquet-file-format.png" alt="Parquet file format" width="500"/>

## Data Persistence

Expand All @@ -29,7 +29,7 @@ When the size of data buffered in MemTables reaches that threshold, GreptimeDB w

Apache Parquet file format provides inherent statistics in headers of column chunks and data pages, which are used for pruning and skipping.

![Column chunk header](/column-chunk-header.png)
<img src="/column-chunk-header.png" alt="Column chunk header" width="350"/>

For example, in the above Parquet file, if you want to filter rows where `name` = `Emily`, you can easily skip row group 0 because the max value for `name` field is `Charlie`. This statistical information reduces IO operations.

Expand Down
Loading