Skip to content

Commit

Permalink
The super large grace period of 1 day has proved to be harmful on
Browse files Browse the repository at this point in the history
Cicada.

This PR lowers it to 2h.
For reminder, starting the detection of the node as dead,
the node gets into a zombie state for 1h.

We do share its KVs.

From timeofdeath+1h to timeofdeath+2h, we won't share the node.

After 2h, we will delete the node from the state.
  • Loading branch information
fulmicoton committed May 16, 2024
1 parent a0c1760 commit 2143a76
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion quickwit/quickwit-cluster/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ mod metrics;
mod node;

use std::net::SocketAddr;
use std::time::Duration;

use async_trait::async_trait;
pub use chitchat::transport::ChannelTransport;
Expand Down Expand Up @@ -147,13 +148,17 @@ pub async fn start_cluster_service(node_config: &NodeConfig) -> anyhow::Result<C
indexing_tasks,
indexing_cpu_capacity,
};
let failure_detector_config = FailureDetectorConfig {
dead_node_grace_period: Duration::from_secs(2 * 60 * 60), // 2 hours
..Default::default()
};
let cluster = Cluster::join(
cluster_id,
self_node,
gossip_listen_addr,
peer_seed_addrs,
node_config.gossip_interval,
FailureDetectorConfig::default(),
failure_detector_config,
&CountingUdpTransport,
)
.await?;
Expand Down

0 comments on commit 2143a76

Please sign in to comment.