Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add StatementStatistics for slow query logging implementation #4719

Merged
merged 8 commits into from
Sep 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 20 additions & 0 deletions config/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,10 @@
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
zyy17 marked this conversation as resolved.
Show resolved Hide resolved
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `export_metrics` | -- | -- | The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.enable` | Bool | `false` | whether enable export metrics. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
Expand Down Expand Up @@ -249,6 +253,10 @@
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `export_metrics` | -- | -- | The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.enable` | Bool | `false` | whether enable export metrics. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
Expand Down Expand Up @@ -314,6 +322,10 @@
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `export_metrics` | -- | -- | The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.enable` | Bool | `false` | whether enable export metrics. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
Expand Down Expand Up @@ -466,6 +478,10 @@
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `export_metrics` | -- | -- | The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.enable` | Bool | `false` | whether enable export metrics. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
Expand Down Expand Up @@ -512,5 +528,9 @@
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `tracing` | -- | -- | The tracing options. Only effect when compiled with `tokio-console` feature. |
| `tracing.tokio_console_addr` | String | Unset | The tokio console address. |
13 changes: 13 additions & 0 deletions config/datanode.example.toml
Original file line number Diff line number Diff line change
Expand Up @@ -586,6 +586,19 @@ log_format = "text"
[logging.tracing_sample_ratio]
default_ratio = 1.0

## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false

## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"

## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0

## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]
Expand Down
13 changes: 13 additions & 0 deletions config/flownode.example.toml
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,19 @@ log_format = "text"
[logging.tracing_sample_ratio]
default_ratio = 1.0

## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false

## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"

## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0

## The tracing options. Only effect when compiled with `tokio-console` feature.
[tracing]
## The tokio console address.
Expand Down
13 changes: 13 additions & 0 deletions config/frontend.example.toml
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,19 @@ log_format = "text"
[logging.tracing_sample_ratio]
default_ratio = 1.0

## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false

## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"

## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0

## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]
Expand Down
13 changes: 13 additions & 0 deletions config/metasrv.example.toml
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,19 @@ log_format = "text"
[logging.tracing_sample_ratio]
default_ratio = 1.0

## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false

## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"

## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0

## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]
Expand Down
13 changes: 13 additions & 0 deletions config/standalone.example.toml
Original file line number Diff line number Diff line change
Expand Up @@ -630,6 +630,19 @@ log_format = "text"
[logging.tracing_sample_ratio]
default_ratio = 1.0

## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false

## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"

## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0

## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]
Expand Down
2 changes: 2 additions & 0 deletions src/cmd/src/frontend.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ use frontend::instance::builder::FrontendBuilder;
use frontend::instance::{FrontendInstance, Instance as FeInstance};
use frontend::server::Services;
use meta_client::{MetaClientOptions, MetaClientType};
use query::stats::StatementStatistics;
use servers::tls::{TlsMode, TlsOption};
use servers::Mode;
use snafu::{OptionExt, ResultExt};
Expand Down Expand Up @@ -352,6 +353,7 @@ impl StartCommand {
catalog_manager,
Arc::new(client),
meta_client,
StatementStatistics::new(opts.logging.slow_query.clone()),
)
.with_plugin(plugins.clone())
.with_local_cache_invalidator(layered_cache_registry)
Expand Down
2 changes: 2 additions & 0 deletions src/cmd/src/standalone.rs
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ use frontend::service_config::{
};
use meta_srv::metasrv::{FLOW_ID_SEQ, TABLE_ID_SEQ};
use mito2::config::MitoConfig;
use query::stats::StatementStatistics;
use serde::{Deserialize, Serialize};
use servers::export_metrics::ExportMetricsOption;
use servers::grpc::GrpcOptions;
Expand Down Expand Up @@ -557,6 +558,7 @@ impl StartCommand {
catalog_manager.clone(),
node_manager.clone(),
ddl_task_executor.clone(),
StatementStatistics::new(opts.logging.slow_query.clone()),
)
.with_plugin(plugins.clone())
.try_build()
Expand Down
1 change: 1 addition & 0 deletions src/common/telemetry/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ backtrace = "0.3"
common-error.workspace = true
console-subscriber = { version = "0.1", optional = true }
greptime-proto.workspace = true
humantime-serde.workspace = true
lazy_static.workspace = true
once_cell.workspace = true
opentelemetry = { version = "0.21.0", default-features = false, features = [
Expand Down
62 changes: 60 additions & 2 deletions src/common/telemetry/src/logging.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
//! logging stuffs, inspired by databend
use std::env;
use std::sync::{Arc, Mutex, Once};
use std::time::Duration;

use once_cell::sync::{Lazy, OnceCell};
use opentelemetry::{global, KeyValue};
Expand All @@ -26,7 +27,7 @@ use serde::{Deserialize, Serialize};
use tracing_appender::non_blocking::WorkerGuard;
use tracing_appender::rolling::{RollingFileAppender, Rotation};
use tracing_log::LogTracer;
use tracing_subscriber::filter::Targets;
use tracing_subscriber::filter::{FilterFn, Targets};
use tracing_subscriber::fmt::Layer;
use tracing_subscriber::layer::SubscriberExt;
use tracing_subscriber::prelude::*;
Expand Down Expand Up @@ -64,6 +65,24 @@ pub struct LoggingOptions {

/// The tracing sample ratio.
pub tracing_sample_ratio: Option<TracingSampleOptions>,

/// The logging options of slow query.
pub slow_query: SlowQueryOptions,
}

/// The options of slow query.
#[derive(Clone, Debug, Serialize, Deserialize, Default)]
#[serde(default)]
pub struct SlowQueryOptions {
/// Whether to enable slow query log.
pub enable: bool,

/// The threshold of slow queries.
#[serde(with = "humantime_serde")]
pub threshold: Option<Duration>,

/// The sample ratio of slow queries.
pub sample_ratio: Option<f64>,
}

#[derive(Clone, Debug, Copy, PartialEq, Eq, Serialize, Deserialize)]
Expand Down Expand Up @@ -96,6 +115,7 @@ impl Default for LoggingOptions {
otlp_endpoint: None,
tracing_sample_ratio: None,
append_stdout: true,
slow_query: SlowQueryOptions::default(),
}
}
}
Expand Down Expand Up @@ -235,6 +255,42 @@ pub fn init_global_logging(
None
};

let slow_query_logging_layer = if !opts.dir.is_empty() && opts.slow_query.enable {
let rolling_appender =
RollingFileAppender::new(Rotation::HOURLY, &opts.dir, "greptimedb-slow-queries");
let (writer, guard) = tracing_appender::non_blocking(rolling_appender);
guards.push(guard);

// Only logs if the field contains "slow".
let slow_query_filter = FilterFn::new(|metadata| {
metadata
.fields()
.iter()
.any(|field| field.name().contains("slow"))
});

if opts.log_format == LogFormat::Json {
Some(
Layer::new()
.json()
.with_writer(writer)
.with_ansi(false)
.with_filter(slow_query_filter)
.boxed(),
)
} else {
Some(
Layer::new()
.with_writer(writer)
.with_ansi(false)
.with_filter(slow_query_filter)
.boxed(),
)
}
} else {
None
};

// resolve log level settings from:
// - options from command line or config files
// - environment variable: RUST_LOG
Expand Down Expand Up @@ -279,6 +335,7 @@ pub fn init_global_logging(
.with(stdout_logging_layer)
.with(file_logging_layer)
.with(err_file_logging_layer)
.with(slow_query_logging_layer)
};

// consume the `tracing_opts` to avoid "unused" warnings.
Expand All @@ -289,7 +346,8 @@ pub fn init_global_logging(
.with(dyn_filter)
.with(stdout_logging_layer)
.with(file_logging_layer)
.with(err_file_logging_layer);
.with(err_file_logging_layer)
.with(slow_query_logging_layer);

if opts.enable_otlp_tracing {
global::set_text_map_propagator(TraceContextPropagator::new());
Expand Down
11 changes: 11 additions & 0 deletions src/common/telemetry/src/macros.rs
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,17 @@ macro_rules! trace {
};
}

#[macro_export]
macro_rules! slow {
(target: $target:expr, $($arg:tt)+) => {
$crate::log!(target: $target, slow = true, $crate::tracing::Level::INFO, $($arg)+)
};

($($arg:tt)+) => {
$crate::log!($crate::tracing::Level::INFO, slow = true, $($arg)+)
};
}

#[cfg(test)]
mod tests {
use common_error::mock::MockError;
Expand Down
2 changes: 2 additions & 0 deletions src/flow/src/server.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ use operator::delete::Deleter;
use operator::insert::Inserter;
use operator::statement::StatementExecutor;
use partition::manager::PartitionRuleManager;
use query::stats::StatementStatistics;
use query::{QueryEngine, QueryEngineFactory};
use servers::error::{AlreadyStartedSnafu, StartGrpcSnafu, TcpBindSnafu, TcpIncomingSnafu};
use servers::server::Server;
Expand Down Expand Up @@ -475,6 +476,7 @@ impl FrontendInvoker {
layered_cache_registry.clone(),
inserter.clone(),
table_route_cache,
StatementStatistics::default(),
));

let invoker = FrontendInvoker::new(inserter, deleter, statement_executor);
Expand Down
5 changes: 5 additions & 0 deletions src/frontend/src/instance/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ use operator::statement::{StatementExecutor, StatementExecutorRef};
use operator::table::TableMutationOperator;
use partition::manager::PartitionRuleManager;
use pipeline::pipeline_operator::PipelineOperator;
use query::stats::StatementStatistics;
use query::QueryEngineFactory;
use servers::server::ServerHandlers;
use snafu::OptionExt;
Expand All @@ -55,6 +56,7 @@ pub struct FrontendBuilder {
plugins: Option<Plugins>,
procedure_executor: ProcedureExecutorRef,
heartbeat_task: Option<HeartbeatTask>,
stats: StatementStatistics,
}

impl FrontendBuilder {
Expand All @@ -65,6 +67,7 @@ impl FrontendBuilder {
catalog_manager: CatalogManagerRef,
node_manager: NodeManagerRef,
procedure_executor: ProcedureExecutorRef,
stats: StatementStatistics,
) -> Self {
Self {
options,
Expand All @@ -76,6 +79,7 @@ impl FrontendBuilder {
plugins: None,
procedure_executor,
heartbeat_task: None,
stats,
}
}

Expand Down Expand Up @@ -181,6 +185,7 @@ impl FrontendBuilder {
local_cache_invalidator,
inserter.clone(),
table_route_cache,
self.stats,
));

let pipeline_operator = Arc::new(PipelineOperator::new(
Expand Down
Loading