Skip to content

Commit

Permalink
chore: update introduction to structure and unstructure model
Browse files Browse the repository at this point in the history
  • Loading branch information
shuiyisong committed Aug 21, 2024
1 parent 9dbfbbd commit 34f20b2
Showing 1 changed file with 19 additions and 0 deletions.
19 changes: 19 additions & 0 deletions log-benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,25 @@ Here are the versions of databases we used in the benchmark
| Clickhouse | 24.9.1.219 |
| Elasticsearch | 8.15.0 |

## Structured model vs Unstructured model
We divide test into two parts, using structured model and unstructured model accordingly. You can also see the difference in create table clause.

__Structured model__

The log data is pre-processed into columns by vector. For example an insert request looks like following
```SQL
INSERT INTO test_table (bytes, http_version, ip, method, path, status, user, timestamp) VALUES ()
```
The goal is to test string/text support for each database. In real scenarios it means the datasource(or log data producers) have separate fields defined, or have already processed the raw input.

__Unstructured model__

The log data is inserted as a long string, and then we build fulltext index upon these strings. For example an insert request looks like following
```SQL
INSERT INTO test_table (message, timestamp) VALUES ()
```
The goal is to test fuzzy search performance for each database. In real scenarios it means the log is produced by some kind of middleware and inserted directly into the database.

## Creating tables
See [here](./create_table.sql) for GreptimeDB and Clickhouse's create table clause.
The mapping of Elastic search is created automatically.
Expand Down

0 comments on commit 34f20b2

Please sign in to comment.