From 2001e26bcb87e42ab47c6dc4ba01e56b5a2bfe41 Mon Sep 17 00:00:00 2001 From: Sayan Nandan Date: Sun, 10 Dec 2023 12:00:18 +0000 Subject: [PATCH] Add architecture notes --- docs/4.architecture.md | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/docs/4.architecture.md b/docs/4.architecture.md index 24abdce06..3e34c182a 100644 --- a/docs/4.architecture.md +++ b/docs/4.architecture.md @@ -11,6 +11,34 @@ That's why, every component in Skytable has been engineered from the ground up, And all of that, without you having to be an expert, and with the least maintenance that you can expect. +## Fundamental differences from SQL + +BlueQL kind of looks and feels like SQL and that's the ultimate goal, so that developers from the SQL world find it easier to migrate. But Skytable's evaluation and execution of queries is fundamentally different from SQL counterparts and even NoSQL engines. Here are some key differences: + +- All DML queries are point queries and **not** range queries: + - This means that they will either return atleast one row or error + - If you intend to do a multi-row query, then it won't work unless you add `ALL`. `ALL` by itself indicates that you're applying (or selecting) a large range and can be inefficient +- Multi-row DML queries are slow and inefficient and are discouraged +- You can **only** query on the primary index, once again because of speed (and the problem with scaling multiple indexes) with a fixed set of operators. +- **Remember, in NoSQL systems we denormalize.** Hence, no `JOIN`s or foreign keys as with many other NoSQL systems +- A different transactional model: + - All DDL and DCL queries are ACID transactions + - However, DML transactions are not ACID and instead are efficiently batched and are automatically made part of a batch + transaction. The engine decides *when* it will execute them, for example based on the pressure on cache. That's because our + focus is to maximize throughput + - All these differences mean that **DDL and DCL transactions are ACID transactions** while **DML queries are ACI and *eventually* D** (we call it a *delayed durability transaction*). This delay however can be adjusted to make sure that the DML + queries *emulate* ACID transactions but that defeats the point of the eventually durable system which aims to heavily increase throughput. + - The idea of eventually durable transactions relies on the idea that hardware failure even though prominent is still rare, + thanks to the extreme hard work that cloud vendors put into reliability engineering. If you plan to run on unreliable hardware, then the delay setting (reliability service) is what you need to change. + - For extremely unreliable hardware on the other hand, we're working on a new storage driver `rtsyncblock` that is expected to be released in Q1'24 +- The transactional engine powering DDL and DCL queries might often choose to demote a transaction to a virtual transaction which makes sure that the transaction is obviously durable but not necessarily actually executed but is eventually executed. If the system crashes, the engine will still be able to actually execute the transaction (even if it crashed halfway) + +:::tip +We believe you can now hopefully see how Skytable's workings are fundamentally different from traditional engines. And, we know +that you might have a lot of questions! If you do, please reach out. We're here to help. +::: + + ## Data model Just like SQL has `DATABASE`s, Skytable has `SPACE`s which are collections of what we call data containers like tables.