Skip to content

Commit

Permalink
docs: update the newest and full faq after v0.7 released
Browse files Browse the repository at this point in the history
  • Loading branch information
beryl678 committed Jun 12, 2024
1 parent d6971bf commit 15896d9
Showing 1 changed file with 110 additions and 11 deletions.
121 changes: 110 additions & 11 deletions docs/v0.6/en/faq-and-others/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,9 @@ plugin](https://grafana.com/docs/grafana/latest/datasources/mysql/) to config Gr

Also, we are implementing PromQL natively which is frequently used with Grafana.

## How does this compare to Loki? Is there a crate with Rust bindings available, preferably as tracing or logging subscriber?
## How does this compare to Loki? Is there a crate with Rust bindings available, preferably as a tracing or logging subscriber?

GreptimeDB is focused on time-series data (or metrics) right now. It may support log and tracing storage in the future.
GreptimeDB has primarily focused on metrics, but will soon offer log storage and full-text search capabilities for logs. These features are expected to be available in version 0.9, which is anticipated to be released in early July.

## When will GreptimeDB release its first GA version?

Expand Down Expand Up @@ -81,29 +81,27 @@ Please refer to [features that you concern](/user-guide/concepts/features-that-y

Theoretically GreptimeDB could replace VictoriaMetrics now since that most protocols are supported, but we hasn't actually tested yet.

## Should I use the command "drop database" to delete a database?
## If I delete the database, can I use the "drop database" command?

Yes, that is the intended command. However, "drop database" has not been implemented in version 0.4. It is expected to be included in the next minor iterative update. Currently, there is an associated pull request (PR) under review for this feature. As a result, there is no direct way to delete a database at the moment. You may consider creating a new database for testing purposes. If you're working with test data, you also have the option to clear it by deleting the data directory.
Yes, the drop database command has been implemented in version 0.8. You can refer to the official documentation for usage: [Drop Database](https://docs.greptime.com/reference/sql/drop#drop).

## Are there any retention policy?

We have implemented table level Time-To-Live (TTL) in [this PR](https://github.com/GreptimeTeam/greptimedb/pull/1052). You can refer to the TTL option of the table build statement [here](/user-guide/concepts/features-that-you-concern#can-i-set-ttl-or-retention-policy-for-different-tables-or-measurements).

## What are the main differences between Greptime and another time-series database built on DataFusion like InfluxDB?
## What are the main differences between GreptimeDB and another time-series database built on DataFusion like InfluxDB?

At GreptimeDB, we share some technical similarities with InfluxDB, both using Datafusion, Arrow, Parquet, and built on object storage. However, we differ in several key aspects:

- Open-Source Strategy: Unlike InfluxDB, which only open-sources its standalone version, our entire distributed cluster version is open-source. Our architecture can even run on edge Android systems.
Open-Source Strategy: Unlike InfluxDB, which only open-sources its standalone version, our entire distributed cluster version is open-source. Our architecture can even run on edge Android systems.

- Distributed Architecture: Our architecture is more aligned with HBase's Region/RegionServer design. Our Write-Ahead Log (WAL) uses Kafka, and we're exploring a quorum-based implementation in the future.

- Workload and Services: We focus on a hybrid workload combining time series and analytics. This integration aims to enhance resource efficiency and real-time performance for users. We also offer [GreptimeCloud](https://greptime.com/product/cloud), a commercial cloud service.
- Workload and Services: We focus on a hybrid workload combining time series and analytics. This integration aims to enhance resource efficiency and real-time performance for users. We also offer GreptimeCloud, a commercial cloud service.

- Storage Engine Design: Our pluggable storage engine is versatile. For scenarios with many small data tables, like in Prometheus, we have a dedicated Metrics storage engine.
Query Language Support: We support PromQL for observability and SQL for data analysis, and incorporate Python for complex data processing. InfluxDB, on the other hand, uses InfluxQL and SQL.

- Query Language Support: We support PromQL for observability and SQL for data analysis, and incorporate Python for complex data processing. InfluxDB, on the other hand, uses InfluxQL and SQL.

We're a young, rapidly evolving project and always looking to improve. For more details, visit [our Blog](https://greptime.com/blogs/) and [Contributor Guide](https://docs.greptime.com/contributor-guide/overview). We welcome your interest and contributions!
We're a young, rapidly evolving project and always looking to improve. For more details, visit our [Blog](https://greptime.com/blogs/) and [Contributor Guide](https://docs.greptime.com/contributor-guide/overview). We welcome your interest and contributions!

## As a first-timer looking to contribute to GreptimeDB, where can I find a comprehensive guide to get started?

Expand All @@ -112,3 +110,104 @@ Welcome! Please refer to our [contribution guide](https://github.com/GreptimeTea
## Does GreptimeDB have a way to handle absolute counters that can reset, like InfluxDB's non-negative differential? How do aggregations work with these counters, and is PromQL preferred over SQL for them? Also, is there a plan to integrate PromQL functions into SQL, similar to InfluxDB v3?

GreptimeDB, like Prometheus, handles counters effectively. Functions like` reset()`, `rate()`, or `delta()` in GreptimeDB are designed to automatically detect and adjust for counter resets. While it's not recommended to use the `deriv()` function on a counter since it's meant for gauges, you can apply `rate()` to your counter and then use `deriv()`. PromQL is indeed more suitable for operations involving counters, given its origin in Prometheus. However, we are exploring the integration of PromQL functions into SQL for greater flexibility. If you're interested in implementing functions into GreptimeDB, we have documentation available which you can check out: [Greptime Documentation](https://github.com/GreptimeTeam/greptimedb/blob/main/docs/how-to/how-to-write-aggregate-function.md).

## Can GreptimeDB be used for a large-scale internal metrics collection system similar to Fb's Gorilla or Google's Monarch, with a preference for in-memory data and high availability? Are there plans for asynchronous WAL or optional disk storage, and how is data replication handled without WAL?

GreptimeDB supports asynchronous WAL and is developing a per-table WAL toggle for more control. A tiered storage approach, starting with in-memory caching, is also in development. For data replication, data flushed to remote stores like S3 is replicated independently of WAL. The details for tiered storage are tracked in issue [db#2516](https://github.com/GreptimeTeam/greptimedb/issues/2516). A remote WAL implementation based on Apache Kafka ensures the durability of unflushed data in cluster mode.

## What are the feature differences between the open-source version and the cloud version of GreptimeDB?

Thank you for asking, here are some key points:

- **Foundational Features**: The foundational features, including the ingestion protocol, SQL capabilities, and storage functions, are largely identical between the two versions. However, GreptimeCloud offers advanced SQL functions and additional features.

- **Fully Managed Service**: GreptimeCloud is a fully managed service that supports multi-tenancy, data encryption, and security audits for compliance, which are not available in the open-source version.

- **Enhanced Dashboard**: Another significant advantage of GreptimeCloud is its superior dashboard, which is more user-friendly and includes a unique Prometheus workbench. This workbench facilitates online editing of Prometheus dashboards and alert rules, as well as GitOps integration.

- **Specialized Solutions**: GreptimeCloud introduces specialized solutions like GreptimeAI, which leverages DBaaS technology. We are also expanding our offerings to include more innovative solutions, such as those for IoT.

As mentioned, the cloud version offers more ready-to-use features to help you get started quickly. The core features are almost identical, especially on our dedicated plan.

## Where can I find documentation related to on-premises deployment and performance benchmark reports?

You can find the public TSBS benchmark results [here](https://github.com/GreptimeTeam/greptimedb/blob/main/docs/benchmarks/tsbs/v0.7.0.md) and the deployment documentation [here](https://docs.greptime.com/getting-started/installation/overview).

## What should I do if the region becomes `DOWNGRADED` and the tables on that node become read-only after the datanode restarts? Is there a way to automatically reactivate it?

According to your configuration, the failover in metasrv, which may mark the region as `DOWNGRADED`, is disabled. Another procedure that may mark a region as `DOWNGRADED` is the region migration procedure. Please try running the region migration procedure and provide feedback for further assistance.

## Is there a guide or suggestions for compiling GreptimeDB into a standalone binary with only the necessary modules and features for an embedded environment?

We have prebuilt binaries for Android ARM64 platforms, which have been successfully used in some enterprise projects. However, these binaries are not available for bare metal devices, as some fundamental components of GreptimeDB require a standard library.

## Why is there a performance drop in query response times after upgrading from GreptimeDB 0.7.2 to 0.8.0 on an Odroid N2+ server? After upgrading from GreptimeDB 0.7.2 to 0.8.0 on an Odroid N2+ server, there might be a noticeable performance drop when querying data with SQL from Grafana. Before the update, the query response time was around 70ms, but it increased to approximately 460ms post-upgrade. This performance issue generally improves over time as the system stabilizes.

Currently, GreptimeDB only builds indexes for persistent data. Therefore, query performance might improve after flushing buffered input data. The in-memory page cache for persistent files also needs to be warmed up by queries after restarting the instance.
Persistence Mechanism: Data is flushed periodically or when the buffered data size reaches a threshold.

Cache Warm-up: Query performance improves as the in-memory page cache warms up.

These mechanisms help stabilize and improve query performance after an upgrade.

## Is there a built-in SQL command like 'compaction table t1' that can be used for manual compaction?

Please follow the instructions provided in this issue: Issue [db#3363](https://github.com/GreptimeTeam/greptimedb/pull/3363).

## Why isn't the data older than 7 days being cleaned up when I set `global_ttl = "7d"` in the configuration file? What is the TTL mechanism?

The global_ttl mechanism was removed in versions after 0.4. The documentation was recently updated to reflect this change. If you need TTL functionality, you can specify a table-level TTL when creating the table (see [Table Options](https://docs.greptime.com/reference/sql/create#table-options)). In future updates, we plan to support database-level TTL.

## Can GreptimeDB be used to store logs?

- The current columnar storage structure can be used to store logs. For example, by setting a column's type to string (non-primary key), logs can be stored. Logs can be written and queried using the supported protocols, and the data can be stored in object storage (OSS/S3) with distributed scalability.

- If logs can be parsed into structured dimensions, they can also be stored as tags (primary key). These tags can then be used for dimensional queries.

- However, there are still a few key features missing. Firstly, full-text indexing (currently, LIKE queries can be used as a substitute). Secondly, specific syntax or SQL functions for log queries. Lastly, support for some unique log ingestion protocols. These features are under active development and are expected to be supported in version 0.9, anticipated for release in early July. However, it may not be a simple replacement for Elasticsearch (ES) since its query syntax needs further exploration. Currently, SQL is the primary query language.

## How is the query performance for non-primary key fields? Can inverted indexes be set? Will the storage cost be lower compared to Elasticsearch?

Currently, non-primary key fields (or non-tag fields) do not have default inverted indexes, and we have not yet provided a `CREATE INDEX` syntax. Inverted index support will be released in an upcoming iteration along with full-text indexing. Without indexes, queries rely on MPP brute-force scanning. Although there is some parallel processing, the efficiency may not be optimal.

As for storage costs, they will certainly be lower. You can use containers and object storage directly without relying on disks, using small local disks for buffering/caching to speed up performance. GreptimeDB employs a tiered storage architecture. For more details, please refer to our documentation on architecture and storage location.

## Is the Log-Structured Merge-Tree engine similar to Kafka's engine model?

From a technical storage perspective, they are similar. However, the actual data formats differ: GreptimeDB reads and writes Parquet format, while Kafka uses its own RecordBatch format. To analyze time-series data temporarily stored in Kafka, it needs to be written into GreptimeDB first.

You can replace Kafka with EMQX, which is also a message queue. Here is a reference example: [EMQX Data Integration with GreptimeDB](https://www.emqx.com/en). The process of writing data from Kafka to GreptimeDB is quite similar.

As mentioned, to analyze the data, it must be written into GreptimeDB first. Consume Kafka messages and write them into GreptimeDB using the provided protocols. If analyzing data directly in Kafka is necessary, you might consider the KSQL project: [KSQL GitHub Repository](https://github.com/confluentinc/ksql). However, our past attempts with KSQL encountered several issues.

We are also working on releasing a Kafka consumer component that will automate the consumption and writing process.

## Are there limitations on the number of tables or columns in GreptimeDB? Does having many columns affect read and write performance?

Generally, there are no strict limitations. With a few hundred tables, as long as there aren't many primary key columns, the impact on write performance is minimal (measured by the number of points written per second, not rows).

Similarly, for reads, if queries only involve a subset of columns, the memory and computational load will not be significantly high.

## Can tables be dynamically partitioned by day based on timestamps, or is this unnecessary because the timestamp field already has an index?

GreptimeDB's data is distributed in timestamp order, so there is no need to additionally shard/partition by timestamp. It is recommended to shard by primary key instead.

## How many servers are generally needed to set up a reliable GreptimeDB cluster, and how should Frontend, Datanode, and Metasrv be deployed? Should each node run all three services regardless of the number of nodes?

A minimum of 3 nodes is required, with each node running the 3 services: metasrv, frontend, and datanode. However, the exact number of nodes depends on the scale of data being handled.

It is not necessary to deploy all three services on each node. A small-sized cluster can be set up with 3 nodes dedicated to metasrv. Frontend and datanode can be deployed on equal nodes, with one container running two processes.

## Several questions about GreptimeDB: Does GreptimeDB v0.7 support inverted indexes, and does it use Tantivy? In v0.8, does the Flow Engine (pre-computation) feature support PromQL syntax for calculations? Will Metasrv support storage backends like MySQL or PostgreSQL?

Yes, we have tested Tantivy, and it meets our current requirements very well.

This is a good suggestion. Currently, the Flow Engine does not support PromQL syntax for calculations. We will evaluate this, as it seems theoretically feasible.

We have developed an abstraction layer for Metasrv, but it does not yet support RDBMS backends. Support for MySQL and PostgreSQL is planned.

For further suggestions, please open an issue on our GitHub repository.

## What is the best way to downsample interface traffic rates (maximum rate within every hour) from multiple NICs across thousands of computers every 30 seconds, so that the data can be kept for many years?

Using a flow table is the appropriate tool for this task. A simple flow task should suffice. The output of a flow task is stored in a normal table, allowing it to be kept indefinitely.

0 comments on commit 15896d9

Please sign in to comment.