-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add log-benchmark stuff for blog #36
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# Log benchmark comparison for GreptimeDB | ||
This repo holds the configuration we used to benchmark GreptimeDB, Clickhouse and Elastic Search. | ||
|
||
Here are the versions of databases we used in the benchmark | ||
|
||
| name | version | | ||
| :------------ | :--------- | | ||
| GreptimeDB | v0.9.2 | | ||
| Clickhouse | 24.9.1.219 | | ||
| Elasticsearch | 8.15.0 | | ||
|
||
## Structured model vs Unstructured model | ||
We divide test into two parts, using structured model and unstructured model accordingly. You can also see the difference in create table clause. | ||
|
||
__Structured model__ | ||
|
||
The log data is pre-processed into columns by vector. For example an insert request looks like following | ||
```SQL | ||
INSERT INTO test_table (bytes, http_version, ip, method, path, status, user, timestamp) VALUES () | ||
``` | ||
The goal is to test string/text support for each database. In real scenarios it means the datasource(or log data producers) have separate fields defined, or have already processed the raw input. | ||
|
||
__Unstructured model__ | ||
|
||
The log data is inserted as a long string, and then we build fulltext index upon these strings. For example an insert request looks like following | ||
```SQL | ||
INSERT INTO test_table (message, timestamp) VALUES () | ||
``` | ||
The goal is to test fuzzy search performance for each database. In real scenarios it means the log is produced by some kind of middleware and inserted directly into the database. | ||
|
||
## Creating tables | ||
See [here](./create_table.sql) for GreptimeDB and Clickhouse's create table clause. | ||
The mapping of Elastic search is created automatically. | ||
|
||
## Vector Configuration | ||
We use vector to generate random log data and send inserts to databases. | ||
Please refer to [structured config](./structured_vector.toml) and [unstructured config](./unstructured_vector.toml) for detailed configuration. | ||
shuiyisong marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## SQLs and payloads | ||
Please refer to [SQL query](./query.sql) for GreptimeDB and Clickhouse, and [query payload](./query.md) for Elastic search. | ||
|
||
## Steps to reproduce | ||
0. Decide whether to run structured model test or unstructured mode test. | ||
1. Build vector binary(see vector's config file for specific branch) and databases binaries accordingly. | ||
2. Create table in GreptimeDB and Clickhouse in advance. | ||
3. Run vector to insert data. | ||
4. When data insertion is finished, run queries against each database. Note: you'll need to update timerange value after data insertion. | ||
|
||
## Addition | ||
- You can tune GreptimeDB's configuration to get better performance. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TODO: add link for performance tuning blog |
||
- You can setup GreptimeDB to use S3 as storage, see [here](https://docs.greptime.com/user-guide/operations/configuration/#storage-options). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
-- GreptimeDB create table clause | ||
-- structured test, use vector to pre-process log data into fields | ||
CREATE TABLE IF NOT EXISTS `test_table` ( | ||
`bytes` Int64 NULL, | ||
`http_version` STRING NULL, | ||
`ip` STRING NULL, | ||
`method` STRING NULL, | ||
`path` STRING NULL, | ||
`status` SMALLINT UNSIGNED NULL, | ||
`user` STRING NULL, | ||
`timestamp` TIMESTAMP(3) NOT NULL, | ||
PRIMARY KEY (`user`, `path`, `status`), | ||
TIME INDEX (`timestamp`) | ||
) | ||
ENGINE=mito | ||
WITH( | ||
append_mode = 'true' | ||
); | ||
|
||
-- unstructured test, build fulltext index on message column | ||
CREATE TABLE IF NOT EXISTS `test_table` ( | ||
`message` STRING NULL FULLTEXT WITH(analyzer = 'English', case_sensitive = 'false'), | ||
`timestamp` TIMESTAMP(3) NOT NULL, | ||
TIME INDEX (`timestamp`) | ||
) | ||
ENGINE=mito | ||
WITH( | ||
append_mode = 'true' | ||
); | ||
|
||
-- Clickhouse create table clause | ||
-- structured test | ||
CREATE TABLE IF NOT EXISTS test_table | ||
( | ||
bytes UInt64 NOT NULL, | ||
http_version String NOT NULL, | ||
ip String NOT NULL, | ||
method String NOT NULL, | ||
path String NOT NULL, | ||
status UInt8 NOT NULL, | ||
user String NOT NULL, | ||
timestamp String NOT NULL, | ||
) | ||
ENGINE = MergeTree() | ||
ORDER BY (user, path, status); | ||
|
||
-- unstructured test | ||
SET allow_experimental_full_text_index = true; | ||
CREATE TABLE IF NOT EXISTS test_table | ||
( | ||
message String, | ||
timestamp String, | ||
INDEX inv_idx(message) TYPE full_text(0) GRANULARITY 1 | ||
) | ||
ENGINE = MergeTree() | ||
ORDER BY tuple(); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,199 @@ | ||
# Query URL and payload for Elastic Search | ||
## Count | ||
URL: `http://127.0.0.1:9200/_count` | ||
|
||
## Query by timerange | ||
URL: `http://127.0.0.1:9200/_search` | ||
|
||
You can use the following payload to get the full timerange first. | ||
```JSON | ||
{"size":0,"aggs":{"max_timestamp":{"max":{"field":"timestamp"}},"min_timestamp":{"min":{"field":"timestamp"}}}} | ||
``` | ||
|
||
And then use this payload to query by timerange. | ||
```JSON | ||
{ | ||
"from": 0, | ||
"size": 1000, | ||
"query": { | ||
"range": { | ||
"timestamp": { | ||
"gte": "2024-08-16T04:30:44.000Z", | ||
"lte": "2024-08-16T04:51:52.000Z" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
## Query by condition | ||
URL: `http://127.0.0.1:9200/_search` | ||
### Structured payload | ||
```JSON | ||
{ | ||
"from": 0, | ||
"size": 10000, | ||
"query": { | ||
"bool": { | ||
"must": [ | ||
{ | ||
"term": { | ||
"user.keyword": "CrucifiX" | ||
} | ||
}, | ||
{ | ||
"term": { | ||
"method.keyword": "OPTION" | ||
} | ||
}, | ||
{ | ||
"term": { | ||
"path.keyword": "/user/booperbot124" | ||
} | ||
}, | ||
{ | ||
"term": { | ||
"http_version.keyword": "HTTP/1.1" | ||
} | ||
}, | ||
{ | ||
"term": { | ||
"status": "401" | ||
} | ||
} | ||
] | ||
} | ||
} | ||
} | ||
``` | ||
### Unstructured payload | ||
```JSON | ||
{ | ||
"from": 0, | ||
"size": 10000, | ||
"query": { | ||
"bool": { | ||
"must": [ | ||
{ | ||
"match_phrase": { | ||
"message": "CrucifiX" | ||
} | ||
}, | ||
{ | ||
"match_phrase": { | ||
"message": "OPTION" | ||
} | ||
}, | ||
{ | ||
"match_phrase": { | ||
"message": "/user/booperbot124" | ||
} | ||
}, | ||
{ | ||
"match_phrase": { | ||
"message": "HTTP/1.1" | ||
} | ||
}, | ||
{ | ||
"match_phrase": { | ||
"message": "401" | ||
} | ||
} | ||
] | ||
} | ||
} | ||
} | ||
``` | ||
|
||
## Query by condition and timerange | ||
URL: `http://127.0.0.1:9200/_search` | ||
### Structured payload | ||
```JSON | ||
{ | ||
"size": 10000, | ||
"query": { | ||
"bool": { | ||
"must": [ | ||
{ | ||
"term": { | ||
"user.keyword": "CrucifiX" | ||
} | ||
}, | ||
{ | ||
"term": { | ||
"method.keyword": "OPTION" | ||
} | ||
}, | ||
{ | ||
"term": { | ||
"path.keyword": "/user/booperbot124" | ||
} | ||
}, | ||
{ | ||
"term": { | ||
"http_version.keyword": "HTTP/1.1" | ||
} | ||
}, | ||
{ | ||
"term": { | ||
"status": "401" | ||
} | ||
}, | ||
{ | ||
"range": { | ||
"timestamp": { | ||
"gte": "2024-08-19T07:03:37.383Z", | ||
"lte": "2024-08-19T07:24:58.883Z" | ||
} | ||
} | ||
} | ||
] | ||
} | ||
} | ||
} | ||
``` | ||
### Unstructured payload | ||
```JSON | ||
{ | ||
"size": 10000, | ||
"query": { | ||
"bool": { | ||
"must": [ | ||
{ | ||
"match_phrase": { | ||
"message": "CrucifiX" | ||
} | ||
}, | ||
{ | ||
"match_phrase": { | ||
"message": "OPTION" | ||
} | ||
}, | ||
{ | ||
"match_phrase": { | ||
"message": "/user/booperbot124" | ||
} | ||
}, | ||
{ | ||
"match_phrase": { | ||
"message": "HTTP/1.1" | ||
} | ||
}, | ||
{ | ||
"match_phrase": { | ||
"message": "401" | ||
} | ||
}, | ||
{ | ||
"range": { | ||
"timestamp": { | ||
"gte": "2024-08-19T05:16:17.099Z", | ||
"lte": "2024-08-19T05:46:02.722Z" | ||
} | ||
} | ||
} | ||
] | ||
} | ||
} | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
-- Structured query for GreptimeDB and Clickhouse | ||
|
||
-- query count | ||
select count(*) from test_table; | ||
|
||
-- query by timerange. Note: place the timestamp range in the where clause | ||
-- GreptimeDB | ||
-- you can use `select max(timestamp)::bigint from test_table;` and `select min(timestamp)::bigint from test_table;` | ||
-- to get the full timestamp range | ||
select * from test_table where timestamp between 1723710843619 and 1723711367588; | ||
-- Clickhouse | ||
-- you can use `select max(timestamp) from test_table;` and `select min(timestamp) from test_table;` | ||
-- to get the full timestamp range | ||
select * from test_table where timestamp between '2024-08-16T03:58:46Z' and '2024-08-16T04:03:50Z'; | ||
|
||
-- query by condition | ||
SELECT * FROM test_table WHERE user = 'CrucifiX' and method = 'OPTION' and path = '/user/booperbot124' and http_version = 'HTTP/1.1' and status = 401; | ||
|
||
-- query by condition and timerange | ||
-- GreptimeDB | ||
SELECT * FROM test_table WHERE user = "CrucifiX" and method = "OPTION" and path = "/user/booperbot124" and http_version = "HTTP/1.1" and status = 401 | ||
and timestamp between 1723774396760 and 1723774788760; | ||
-- Clickhouse | ||
SELECT * FROM test_table WHERE user = 'CrucifiX' and method = 'OPTION' and path = '/user/booperbot124' and http_version = 'HTTP/1.1' and status = 401 | ||
and timestamp between '2024-08-16T03:58:46Z' and '2024-08-16T04:03:50Z'; | ||
|
||
-- Unstructured query for GreptimeDB and Clickhouse | ||
|
||
|
||
-- query by condition | ||
-- GreptimeDB | ||
SELECT * FROM test_table WHERE MATCHES(message, "+CrucifiX +OPTION +/user/booperbot124 +HTTP/1.1 +401"); | ||
-- Clickhouse | ||
SELECT * FROM test_table WHERE (message LIKE '%CrucifiX%') | ||
AND (message LIKE '%OPTION%') | ||
AND (message LIKE '%/user/booperbot124%') | ||
AND (message LIKE '%HTTP/1.1%') | ||
AND (message LIKE '%401%'); | ||
|
||
-- query by condition and timerange | ||
-- GreptimeDB | ||
SELECT * FROM test_table WHERE MATCHES(message, "+CrucifiX +OPTION +/user/booperbot124 +HTTP/1.1 +401") | ||
and timestamp between 1723710843619 and 1723711367588; | ||
-- Clickhouse | ||
SELECT * FROM test_table WHERE (message LIKE '%CrucifiX%') | ||
AND (message LIKE '%OPTION%') | ||
AND (message LIKE '%/user/booperbot124%') | ||
AND (message LIKE '%HTTP/1.1%') | ||
AND (message LIKE '%401%') | ||
AND timestamp between '2024-08-15T10:25:26.524000000Z' AND '2024-08-15T10:31:31.746000000Z'; |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: add link for en and ch blogs