You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So I've been extensively investigating the following horizontal scale databases:
Citus
Vitess
CockroachDB
Yugabyte
Citus wins in a lot of ways... performance, ease of re-balancing, licensing, overall architectural simplicity of how tables are sharded and how you define sharding keys.. it's elegant! (In Vitess land, maintaining the VSchema and specifying shard boundaries yourself is a pain in the butt. Bravo to Citus.)
The biggest downside I've hit regarding Citus is the poor HA story in 2024. Please correct me if I'm missing something (please share your thoughts, solutions!).
Replicas seem to be "stand by" only and not usable for reads? This means adding 2x hardware that sits idle, just to achieve HA.
If you want synchronous writes, you're gonna have 3x the hardware sitting idle because the first replica needs its own replica to not hang.
The ideal place to have HA logic is in Citus. Patroni + etcd is more moving parts when Citus could just talk to itself. Creates coordination issues like the one above, and blows up Ops complexity.
Patroni also seems to require Ops intervention to "reset" it (or a risky "fail back")
As a newbie citus.shard_replication_factor = 2 looked like a low friction path forward for HA (no manual Ops intervention is wonderful!), but it breaks in my testing for this purpose (and apparently) can't be used for HA at all even if you're willing to give up FKs and sacrifice a bit of consistency. HA should ideally be this easy, even if the usable feature set is a bit limited, so we all don't have to wait years for a better HA solution.
Vitess HA is built-in with VTOrc- just put more replicas online whenever, they will be promoted as needed. Queries to both primary and replicas re-route in VTGate.
CockroachDB and Yugabyte just get HA for free with the replication model.
Citus really needs a simple baked in answer to HA. The Citus default would be an under-replicated dangerous state in the other databases listed.
Side note, example of the community getting confused by this:
Not sure if Percona is aware but these tutorials only work once. shard_replication_factor = 2 breaks the 2nd time you fail / recover a node (ex: citus_disable_node() then citus_activate_node() then rebalance_table_shards())... You can witness the replicas disappear using: SELECT * from citus_shards;
The text was updated successfully, but these errors were encountered:
Citus wins in a lot of ways... performance, ease of re-balancing, licensing, overall architectural simplicity of how tables are sharded and how you define sharding keys.. it's elegant!
Just want to add my 0.02 cents to this. Not sure how you compared it, but there is no way to compete with CockroachDB. I wish Citus could be similar—it would improve HA by a huge extent.
Full disclaimer: I ran the CockroachDB core (free) version for some years in production and never ran Citus. I am not affiliated with either vendor.
For CockroachDB, you basically spin up nodes and... that’s it. Everything else, like rebalancing, sharding, and self-healing (if a host comes back), is done automatically. There’s no need for (explicit) sharding keys or any additional (non-SQL) definitions. Every node is the same from a user perspective - no coordinator, no workers, no tablets, no PgBouncer, no Patroni, no etcd. Just uniform nodes. Want to scale? Just add nodes. Want to upgrade hardware? Just roll out node-by-node without downtime. Want to upgrade the CRDB version? Just upgrade node-by-node without downtime. (However, there is an option to stay storage compatible if something goes wrong, and one time we did have issues with a new version.) Want to migrate tables? Just do it - without downtime, if your application can handle it.
As a user, this is what I expect from a distributed database and in terms of HA. However, please don't get me wrong - this isn't a rant. I think Citus is a great piece of software too, and I’m very grateful to every contributor (and Microsoft) for it. I'm considering using it later in my current project. So, I would love to see improvements in this area! 💖
So I've been extensively investigating the following horizontal scale databases:
Citus wins in a lot of ways... performance, ease of re-balancing, licensing, overall architectural simplicity of how tables are sharded and how you define sharding keys.. it's elegant! (In Vitess land, maintaining the VSchema and specifying shard boundaries yourself is a pain in the butt. Bravo to Citus.)
The biggest downside I've hit regarding Citus is the poor HA story in 2024. Please correct me if I'm missing something (please share your thoughts, solutions!).
As a newbie
citus.shard_replication_factor = 2
looked like a low friction path forward for HA (no manual Ops intervention is wonderful!), but it breaks in my testing for this purpose (and apparently) can't be used for HA at all even if you're willing to give up FKs and sacrifice a bit of consistency. HA should ideally be this easy, even if the usable feature set is a bit limited, so we all don't have to wait years for a better HA solution.Citus really needs a simple baked in answer to HA. The Citus default would be an under-replicated dangerous state in the other databases listed.
Side note, example of the community getting confused by this:
These look great at first, but...
Not sure if Percona is aware but these tutorials only work once.
shard_replication_factor = 2
breaks the 2nd time you fail / recover a node (ex:citus_disable_node()
thencitus_activate_node()
thenrebalance_table_shards()
)... You can witness the replicas disappear using:SELECT * from citus_shards;
The text was updated successfully, but these errors were encountered: