forked from timescale/tsbs
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #16 from GreptimeTeam/feat/tsbs-manual
docs: add greptimedb_vs_influxdb_manual.md
- Loading branch information
Showing
1 changed file
with
323 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,323 @@ | ||
# GreptimeDB vs. InfluxDB TSBS Testing Manual | ||
|
||
This manual helps developers reproduce the test results from the "GreptimeDB vs. InfluxDB Performance Test Report". For detailed results, please read the report. | ||
|
||
## Testing Tools | ||
|
||
GreptimeDB fork branch, which adds support for GreptimeDB and InfluxDB v2 compared to the official version: | ||
[https://github.com/GreptimeTeam/tsbs](https://github.com/GreptimeTeam/tsbs) | ||
|
||
## Test Environment | ||
|
||
**Hardware Environment:** | ||
- **Instance Type:** c5d.2xlarge | ||
- **Processor Specifications:** 8 cores | ||
- **Memory:** 16 GB | ||
- **Disk:** 100GB (GP3) | ||
- **Operating System:** Ubuntu Server 24.04 LTS | ||
|
||
**Software Version:** | ||
- **Database:** | ||
- **GreptimeDB:** v0.9.1 | ||
- **InfluxDB:** v2.7.7 | ||
|
||
Except for GreptimeDB being set up with local caching for S3 testing, all other parameter configurations remain default without special adjustments. | ||
|
||
## Software Installation | ||
|
||
### Installing Go | ||
|
||
Download link: | ||
[https://go.dev/dl/go1.22.5.linux-amd64.tar.gz](https://go.dev/dl/go1.22.5.linux-amd64.tar.gz) | ||
|
||
Commands: | ||
```sh | ||
wget https://go.dev/dl/go1.22.5.linux-amd64.tar.gz | ||
sudo tar -C /usr/local -xzf go1.22.5.linux-amd64.tar.gz | ||
export PATH=$PATH:/usr/local/go/bin | ||
go version | ||
``` | ||
|
||
### Installing InfluxDB | ||
|
||
Create influxdb2 directory: | ||
```sh | ||
mkdir influxdb2 | ||
cd influxdb2 | ||
``` | ||
|
||
Download InfluxDB: | ||
```sh | ||
wget https://dl.influxdata.com/influxdb/releases/influxdb2-2.7.7_linux_amd64.tar.gz | ||
tar xvfz influxdb2-2.7.7_linux_amd64.tar.gz | ||
``` | ||
|
||
Additionally, install CLI: | ||
```sh | ||
wget https://download.influxdata.com/influxdb/releases/influxdb2-client-2.7.5-linux-amd64.tar.gz | ||
``` | ||
|
||
Start the server: | ||
```sh | ||
./influxdb2-2.7.7/usr/bin/influxd | ||
``` | ||
|
||
Output: | ||
```sh | ||
2024-07-17T09:13:21.202512Z info Welcome to InfluxDB {"log_id": "0qR~KSCW000", "version": "v2.7.7", "commit": "e9e0f744fa", "build_date": "2024-07-11T18:45:02Z", "log_level": "info"} | ||
...(trimmed for brevity)... | ||
``` | ||
|
||
### Installing GreptimeDB | ||
|
||
Refer to the official documentation for installing GreptimeDB: | ||
|
||
Create greptime directory in the home directory: | ||
```sh | ||
mkdir greptime | ||
cd greptime | ||
``` | ||
|
||
Download: | ||
```sh | ||
wget https://github.com/GreptimeTeam/greptimedb/releases/download/v0.9.1/greptime-linux-amd64-v0.9.1.tar.gz | ||
tar xvfz greptime-linux-amd64-v0.9.1.tar.gz | ||
``` | ||
|
||
Run GreptimeDB with the following command, specifying the data directory via command-line parameters: | ||
```sh | ||
./greptime-linux-amd64-v0.9.1/greptime standalone start --data-home /path/to/greptime/data | ||
``` | ||
|
||
Alternatively, you can configure additional parameters via a configuration file. Assuming the configuration file is named `config.toml`, you can start it with: | ||
```sh | ||
./greptime-linux-amd64-v0.9.1/greptime standalone start --config-file /path/to/config.toml | ||
``` | ||
|
||
## Test Execution | ||
|
||
### Test Data Generation | ||
|
||
**TSBS** [Time Series Benchmark Suite] does not provide precompiled binaries, so it needs to be compiled manually. Compiling **TSBS** requires **Go** to be installed beforehand. You can refer to https://go.dev/doc/install for specific details. The version used this time is **1.22.5**. | ||
|
||
After installing **Go**, you can clone TSBS to the current directory: | ||
```bash | ||
git clone https://github.com/GreptimeTeam/tsbs.git | ||
``` | ||
|
||
If **make** is not available, you need to install **make** first: | ||
```bash | ||
sudo apt install make | ||
``` | ||
|
||
Navigate to the **tsbs** directory and run **make** to compile TSBS. The first time you compile, it might take a while as some dependencies are downloaded: | ||
```bash | ||
cd tsbs | ||
make | ||
``` | ||
|
||
Once the compilation is successful, you should see many binaries generated under the **bin** directory, although we will only use a part of them: | ||
```bash | ||
$ ls ./bin/ | ||
tsbs_generate_data tsbs_load_influx2 tsbs_run_queries_clickhouse | ||
tsbs_generate_queries tsbs_load_mongo tsbs_run_queries_cratedb | ||
tsbs_load tsbs_load_prometheus tsbs_run_queries_influx | ||
tsbs_load_akumuli tsbs_load_questdb tsbs_run_queries_mongo | ||
tsbs_load_cassandra tsbs_load_siridb tsbs_run_queries_questdb | ||
tsbs_load_clickhouse tsbs_load_timescaledb tsbs_run_queries_siridb | ||
tsbs_load_cratedb tsbs_load_victoriametrics tsbs_run_queries_timescaledb | ||
tsbs_load_greptime tsbs_run_queries_akumuli tsbs_run_queries_timestream | ||
tsbs_load_influx tsbs_run_queries_cassandra tsbs_run_queries_victoriametrics | ||
``` | ||
|
||
To generate a directory to store the generated data, you can create a new directory within the **tsbs** directory: | ||
```bash | ||
mkdir bench-data | ||
``` | ||
|
||
Run the data generation command, where `influx-data.lp` is the generated test data file, which will be used for single-machine testing: | ||
```bash | ||
./bin/tsbs_generate_data --use-case="cpu-only" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:00Z" --log-interval="10s" --format="influx" > ./bench-data/influx-data.lp | ||
``` | ||
|
||
Execute the following commands to generate queries for **InfluxDB**: | ||
```bash | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=100 --query-type cpu-max-all-1 --format="influx" > ./bench-data/influx-queries-cpu-max-all-1.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=100 --query-type cpu-max-all-8 --format="influx" > ./bench-data/influx-queries-cpu-max-all-8.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=50 --query-type double-groupby-1 --format="influx" > ./bench-data/influx-queries-double-groupby-1.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=50 --query-type double-groupby-5 --format="influx" > ./bench-data/influx-queries-double-groupby-5.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=50 --query-type double-groupby-all --format="influx" > ./bench-data/influx-queries-double-groupby-all.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=50 --query-type groupby-orderby-limit --format="influx" > ./bench-data/influx-queries-groupby-orderby-limit.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=100 --query-type high-cpu-1 --format="influx" > ./bench-data/influx-queries-high-cpu-1.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=50 --query-type high-cpu-all --format="influx" > ./bench-data/influx-queries-high-cpu-all.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=10 --query-type lastpoint --format="influx" > ./bench-data/influx-queries-lastpoint.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=100 --query-type single-groupby-1-1-1 --format="influx" > ./bench-data/influx-queries-single-groupby-1-1-1.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=100 --query-type single-groupby-1-1-12 --format="influx" > ./bench-data/influx-queries-single-groupby-1-1-12.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=100 --query-type single-groupby-1-8-1 --format="influx" > ./bench-data/influx-queries-single-groupby-1-8-1.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=100 --query-type single-groupby-5-1-1 --format="influx" > ./bench-data/influx-queries-single-groupby-5-1-1.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=100 --query-type single-groupby-5-1-12 --format="influx" > ./bench-data/influx-queries-single-groupby-5-1-12.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=100 --query-type single-groupby-5-8-1 --format="influx" > ./bench-data/influx-queries-single-groupby-5-8-1.dat | ||
``` | ||
|
||
Execute the following commands to generate queries for **GreptimeDB**: | ||
```bash | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=100 --query-type cpu-max-all-1 --format="greptime" > ./bench-data/greptime-queries-cpu-max-all-1.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=100 --query-type cpu-max-all-8 --format="greptime" > ./bench-data/greptime-queries-cpu-max-all-8.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=50 --query-type double-groupby-1 --format="greptime" > ./bench-data/greptime-queries-double-groupby-1.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=50 --query-type double-groupby-5 --format="greptime" > ./bench-data/greptime-queries-double-groupby-5.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=50 --query-type double-groupby-all --format="greptime" > ./bench-data/greptime-queries-double-groupby-all.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=50 --query-type groupby-orderby-limit --format="greptime" > ./bench-data/greptime-queries-groupby-orderby-limit.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=100 --query-type high-cpu-1 --format="greptime" > ./bench-data/greptime-queries-high-cpu-1.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=50 --query-type high-cpu-all --format="greptime" > ./bench-data/greptime-queries-high-cpu-all.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=10 --query-type lastpoint --format="greptime" > ./bench-data/greptime-queries-lastpoint.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=100 --query-type single-groupby-1-1-1 --format="greptime" > ./bench-data/greptime-queries-single-groupby-1-1-1.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=100 --query-type single-groupby-1-1-12 --format="greptime" > ./bench-data/greptime-queries-single-groupby-1-1-12.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=100 --query-type single-groupby-1-8-1 --format="greptime" > ./bench-data/greptime-queries-single-groupby-1-8-1.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=100 --query-type single-groupby-5-1-1 --format="greptime" > ./bench-data/greptime-queries-single-groupby-5-1-1.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=100 --query-type single-groupby-5-1-12 --format="greptime" > ./bench-data/greptime-queries-single-groupby-5-1-12.dat | ||
./bin/tsbs_generate_queries --use-case="devops" --seed=123 --scale=4000 --timestamp-start="2023-06-11T00:00:00Z" --timestamp-end="2023-06-14T00:00:01Z" --queries=100 --query-type single-groupby-5-8-1 --format="greptime" > ./bench-data/greptime-queries-single-groupby-5-8-1.dat | ||
``` | ||
|
||
|
||
### InfluxDB | ||
|
||
#### Initialize | ||
After the first installation, InfluxDB needs to be initialized to obtain a token for requests. If you already have a token, you can skip this part. | ||
|
||
Initialize InfluxDB: | ||
```sh | ||
./influx setup \ | ||
--username test \ | ||
--password 12345678 \ | ||
--token test-token \ | ||
--org test-org \ | ||
--bucket test-bucket \ | ||
--force | ||
``` | ||
|
||
Output: | ||
``` | ||
User Organization Bucket | ||
test test-org test-bucket | ||
``` | ||
|
||
Create token: | ||
```sh | ||
./influx auth create \ | ||
--org test-org \ | ||
--all-access | ||
``` | ||
|
||
Output: | ||
``` | ||
ID Description Token User Name User ID Permissions | ||
0d5c027a74259000 VTw2gBvdrgqQLpR2drSfiVgaVy-W4LLnJ1JiyLIAHgYhKYJdj9eW3Z26pnjelCiC7Q-dBGHvpZpGykjE_WqIgQ== test 0d5c0275c5e59000 ... | ||
``` | ||
|
||
Export the token for subsequent requests: | ||
```sh | ||
export INFLUX2_TOKEN="VTw2gBvdrgqQLpR2drSfiVgaVy-W4LLnJ1JiyLIAHgYhKYJdj9eW3Z26pnjelCiC7Q-dBGHvpZpGykjE_WqIgQ==" | ||
``` | ||
|
||
#### Data Import | ||
To import data in the `tsbs` directory, run the following command: | ||
```sh | ||
./bin/tsbs_load_influx2 \ | ||
--urls=http://localhost:8086 \ | ||
--file=./bench-data/influx-data.lp \ | ||
--do-create-db=false \ | ||
--org-id=test-org \ | ||
--db-name=test-bucket \ | ||
--batch-size=3000 \ | ||
--workers=8 \ | ||
--auth-token=$INFLUX2_TOKEN | ||
``` | ||
|
||
#### Queries | ||
Run the queries in the `tsbs` directory: | ||
```sh | ||
./bin/tsbs_run_queries_influx --file=./bench-data/influx-queries-cpu-max-all-1.dat --db-name=test-bucket --is-v2=true --auth-token=$INFLUX2_TOKEN --urls="http://localhost:8086" | ||
...(repeat for all queries)... | ||
``` | ||
|
||
### GreptimeDB | ||
|
||
GreptimeDB was tested with both local disk and S3-based object storage. Configurations for both are as follows: | ||
|
||
#### Local Disk (EBS) | ||
Configuration file used for testing: | ||
```toml | ||
[http] | ||
addr = "0.0.0.0:4000" | ||
|
||
[logging] | ||
dir = "/home/ubuntu/greptime/logs" | ||
|
||
[storage] | ||
data_home = "/home/ubuntu/greptime/data-local" | ||
``` | ||
|
||
Startup command assuming the GreptimeDB binary path is `./greptime` and the config file path is `/home/ubuntu/greptime/config-local.toml`: | ||
```sh | ||
./greptime standalone start --config /home/ubuntu/greptime/config-local.toml | ||
``` | ||
|
||
#### S3 Object Storage | ||
When using S3, additional S3 configurations are required. Example configuration during testing with local disk cache enabled: | ||
```toml | ||
[http] | ||
addr = "0.0.0.0:4000" | ||
|
||
[logging] | ||
dir = "/home/ubuntu/greptime/logs" | ||
|
||
[storage] | ||
data_home = "/home/ubuntu/greptime/data-s3" | ||
type = "S3" | ||
bucket = "your-bucket" | ||
root = "tsbs-data-s3" | ||
access_key_id = "****" | ||
secret_access_key = "****" | ||
endpoint = "https://s3-endpoint/" | ||
region = "your-region" | ||
cache_path = "/home/ubuntu/greptime/s3cache" | ||
cache_capacity = "20G" | ||
|
||
[[region_engine]] | ||
[region_engine.mito] | ||
enable_experimental_write_cache = true | ||
experimental_write_cache_size = "20G" | ||
``` | ||
|
||
Startup command assuming the GreptimeDB binary path is `./greptime` and the config file path is `/home/ubuntu/greptime/config-s3.toml`: | ||
```sh | ||
./greptime standalone start --config /home/ubuntu/greptime/config-s3.toml | ||
``` | ||
|
||
#### Data Import | ||
To import data in the `tsbs` directory, run the following command: | ||
```sh | ||
./bin/tsbs_load_greptime \ | ||
--urls=http://localhost:4000 \ | ||
--file=./bench-data/influx-data.lp \ | ||
--batch-size=3000 \ | ||
--gzip=false \ | ||
--workers=6 | ||
``` | ||
|
||
#### Queries | ||
Run the queries in the `tsbs` directory: | ||
```sh | ||
./bin/tsbs_run_queries_influx --file=./bench-data/greptime-queries-cpu-max-all-1.dat --db-name=benchmark --urls="http://localhost:4000" | ||
...(repeat for all queries)... | ||
``` | ||
|
||
## References | ||
|
||
- [InfluxData Downloads](https://www.influxdata.com/downloads/) | ||
- [InfluxDB Installation](https://docs.influxdata.com/influxdb/v2/install/#start-influxdb) | ||
- [InfluxDB CLI Tools](https://docs.influxdata.com/influxdb/v2/tools/influx-cli/?t=Linux) | ||
- [InfluxDB Setup](https://docs.influxdata.com/influxdb/v2/get-started/setup/?t=Set+up+with+the+CLI) | ||
- [Greptime Team TSBS](https://github.com/GreptimeTeam/tsbs) | ||
- TSBS v0.9 vs InfluxDB | ||
|