diff --git a/docs/contributor-guide/datanode/data-persistence-indexing.md b/docs/contributor-guide/datanode/data-persistence-indexing.md index e830dce90..3d28c553b 100644 --- a/docs/contributor-guide/datanode/data-persistence-indexing.md +++ b/docs/contributor-guide/datanode/data-persistence-indexing.md @@ -17,7 +17,7 @@ First, clustering data by column makes file scanning more efficient, especially Second, data of the same column tends to be homogeneous which helps with compression when apply techniques like dictionary and Run-Length Encoding (RLE). -![Parquet file format](/parquet-file-format.png) +Parquet file format ## Data Persistence @@ -29,7 +29,7 @@ When the size of data buffered in MemTables reaches that threshold, GreptimeDB w Apache Parquet file format provides inherent statistics in headers of column chunks and data pages, which are used for pruning and skipping. -![Column chunk header](/column-chunk-header.png) +Column chunk header For example, in the above Parquet file, if you want to filter rows where `name` = `Emily`, you can easily skip row group 0 because the max value for `name` field is `Charlie`. This statistical information reduces IO operations. diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/contributor-guide/datanode/data-persistence-indexing.md b/i18n/zh/docusaurus-plugin-content-docs/current/contributor-guide/datanode/data-persistence-indexing.md index 698b32add..7b4dcf2df 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/contributor-guide/datanode/data-persistence-indexing.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/contributor-guide/datanode/data-persistence-indexing.md @@ -17,7 +17,7 @@ Parquet 具有层次结构,类似于“行组-列-数据页”。Parquet 文 其次,相同列的数据往往是同质的(比如具备近似的值),这有助于在采用字典和 Run-Length Encoding(RLE)等技术进行压缩。 -![Parquet file format](/parquet-file-format.png) +Parquet file format ## 数据持久化 @@ -28,7 +28,7 @@ GreptimeDB 提供了 `storage.flush.global_write_buffer_size` 的配置项来设 Apache Parquet 文件格式在列块和数据页的头部提供了内置的统计信息,用于剪枝和跳过。 -![Column chunk header](/column-chunk-header.png) +Column chunk header 例如,在上述 Parquet 文件中,如果你想要过滤 `name` 等于 `Emily` 的行,你可以轻松跳过行组 0,因为 `name` 字段的最大值是 `Charlie`。这些统计信息减少了 IO 操作。 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-0.11/contributor-guide/datanode/data-persistence-indexing.md b/i18n/zh/docusaurus-plugin-content-docs/version-0.11/contributor-guide/datanode/data-persistence-indexing.md index 698b32add..7b4dcf2df 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-0.11/contributor-guide/datanode/data-persistence-indexing.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-0.11/contributor-guide/datanode/data-persistence-indexing.md @@ -17,7 +17,7 @@ Parquet 具有层次结构,类似于“行组-列-数据页”。Parquet 文 其次,相同列的数据往往是同质的(比如具备近似的值),这有助于在采用字典和 Run-Length Encoding(RLE)等技术进行压缩。 -![Parquet file format](/parquet-file-format.png) +Parquet file format ## 数据持久化 @@ -28,7 +28,7 @@ GreptimeDB 提供了 `storage.flush.global_write_buffer_size` 的配置项来设 Apache Parquet 文件格式在列块和数据页的头部提供了内置的统计信息,用于剪枝和跳过。 -![Column chunk header](/column-chunk-header.png) +Column chunk header 例如,在上述 Parquet 文件中,如果你想要过滤 `name` 等于 `Emily` 的行,你可以轻松跳过行组 0,因为 `name` 字段的最大值是 `Charlie`。这些统计信息减少了 IO 操作。 diff --git a/versioned_docs/version-0.11/contributor-guide/datanode/data-persistence-indexing.md b/versioned_docs/version-0.11/contributor-guide/datanode/data-persistence-indexing.md index e830dce90..3d28c553b 100644 --- a/versioned_docs/version-0.11/contributor-guide/datanode/data-persistence-indexing.md +++ b/versioned_docs/version-0.11/contributor-guide/datanode/data-persistence-indexing.md @@ -17,7 +17,7 @@ First, clustering data by column makes file scanning more efficient, especially Second, data of the same column tends to be homogeneous which helps with compression when apply techniques like dictionary and Run-Length Encoding (RLE). -![Parquet file format](/parquet-file-format.png) +Parquet file format ## Data Persistence @@ -29,7 +29,7 @@ When the size of data buffered in MemTables reaches that threshold, GreptimeDB w Apache Parquet file format provides inherent statistics in headers of column chunks and data pages, which are used for pruning and skipping. -![Column chunk header](/column-chunk-header.png) +Column chunk header For example, in the above Parquet file, if you want to filter rows where `name` = `Emily`, you can easily skip row group 0 because the max value for `name` field is `Charlie`. This statistical information reduces IO operations.