Skip to content

Commit

Permalink
add the get-started documents
Browse files Browse the repository at this point in the history
This commit adds documents that cover
basic concepts in ScyllaDB and help
developers get started with ScyllaDB.

In addition, the left navigation bar
is enabled in this project so that
the new documents are shown in
the page tree.
  • Loading branch information
annastuchlik committed Dec 8, 2023
1 parent d601338 commit 3cec8e4
Show file tree
Hide file tree
Showing 19 changed files with 1,140 additions and 1 deletion.
100 changes: 100 additions & 0 deletions docs/data-modeling/best-practices.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
====================================
Data Modeling Best Practices
====================================

These additional topics provide a broader perspective on data modeling, query
design, schema design, and best practices when working with ScyllaDB or similar
distributed NoSQL databases.

**Partition Key Selection**

Choose your partition keys to avoid imbalances in your clusters. Imbalanced
partitions can lead to performance bottlenecks, which impact overall cluster
performance. Balancing the distribution of data across partitions is crucial
to ensure all nodes are effectively utilized in your cluster.

Let's consider a scenario with poor partition key selection:

.. code::
CREATE TABLE my_keyspace.messages_bad (
message_id uuid PRIMARY KEY,
user_id uuid,
message_text text,
created_at timestamp
);
In this model, the partition key is chosen as ``message_id``, which is a globally
unique identifier for each message. This choice results in poor partition key
selection because it doesn't distribute data evenly across partitions. As
a result, messages from popular users with many posts will create hot
partitions, as all their messages will be concentrated in a single partition.

A better solution for partition key selection would look like:

.. code::
CREATE TABLE my_keyspace.messages_good (
user_id uuid,
message_id uuid,
message_text text,
created_at timestamp,
PRIMARY KEY (user_id, message_id)
);
In this improved model, the partition key is chosen as ``user_id``, which is
the unique identifier for each user. This choice results in even data
distribution across partitions because each user's messages are distributed
across multiple partitions based on their ``user_id``. Popular users with many
posts won't create hot partitions, as their messages are distributed across
the cluster. This approach ensures that all nodes in the cluster are
effectively utilized, preventing performance bottlenecks.

**Tombstones and Delete Workloads**

If your workload involves frequent deletes, it’s crucial that you understand
the implications of tombstones on your read path. Tombstones are markers for
deleted data and can negatively affect query performance if not managed
effectively.

Let's consider a data model for storing user messages:

.. code::
CREATE TABLE my_keyspace.user_messages (
user_id uuid,
message_id uuid,
message_text text,
is_deleted boolean,
PRIMARY KEY (user_id, message_id)
);
In this table, each user can have multiple messages, identified by
``user_id`` and ``message_id``.
The ``is_deleted`` column is used to mark messages as deleted (true) or not
deleted (false). When a user deletes a message, a tombstone is created to mark
the message as deleted. Tombstones are necessary for data consistency, but can
negatively affect query performance, especially when there are frequent delete
operations.

Adjust your compaction strategy to account for tombstones and optimize query
performance in scenarios with heavy delete operations.

To optimize query performance in scenarios with heavy delete operations, you
can `adjust the compaction strategy and use TTL <https://opensource.docs.scylladb.com/stable/kb/ttl-facts.html>`_
(Time-to-Live) to handle tombstones more efficiently. ScyllaDB allows you to
choose different compaction strategies. In scenarios with heavy delete
workloads, consider using a compaction strategy that efficiently handles
tombstones, such as the ``TimeWindowCompactionStrategy``.

.. code::
ALTER TABLE my_keyspace.user_messages
WITH default_time_to_live = 2592000
AND compaction = {'class': 'TimeWindowCompactionStrategy', 'base_time_seconds': 86400, 'max_sstable_age_days': 14};
This setup, with a 30-day TTL (``default_time_to_live = 2592000``) and
a 14-day maximum SSTable age ``('max_sstable_age_days': 14)``, is suited for
time-sensitive data scenarios where keeping data beyond a month is
unnecessary, and the most relevant data is always from the last two weeks.
34 changes: 34 additions & 0 deletions docs/data-modeling/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
===============
Data Modeling
===============

Data modeling is the process of defining the structure and relationships of
your data in ScyllaDB. It involves making important decisions about how data
will be organized, stored, and retrieved.

There are several types of data models, which include conceptual, logical,
and physical data models. Conceptual models tend to focus on high-level
business processes, while logic models detail the data structure. Physical
models consider how data is stored on the underlying infrastructure.

Data modeling in NoSQL database such as ScyllaDB differs from traditional
relational databases. You may need to emphasize denormalization, scaling, and
optimal data access patterns to get the most out of ScyllaDB.

A practical approach when data modeling for ScyllaDB is to adopt a query-first
data model, where you design your data model around the queries that it needs
to execute.


.. toctree::
:titlesonly:

query-design
schema-design
best-practices






18 changes: 18 additions & 0 deletions docs/data-modeling/query-design.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
====================
Query Design
====================

Your data model is heavily influenced by query efficiency. Effective
partitioning, clustering columns, and denormalization are key considerations
for optimizing data access patterns.

Your query design should also be optimized for efficient and effective queries
to retrieve and manipulate data. Query optimization aims to minimize resource
usage and latency while achieving maximum throughput.

Indexing is another important aspect of query design. We have already
introduced the basic concept of primary keys, which can be made up of two
parts: the partition key and optional clustering columns. ScyllaDB also
supports secondary indexes for non-primary key columns. Secondary indexes can
improve query flexibility, but it’s important to consider their impact on
performance.
116 changes: 116 additions & 0 deletions docs/data-modeling/schema-design.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
=======================
Schema Design
=======================

An advantage of NoSQL databases such as ScyllaDB is the perception that schema
design can evolve. You can add new columns, tables, or indexes over time to
accommodate your requirements. A simple example might be adding a new column
to the ``users`` table:

.. code::
ALTER TABLE my_keyspace.users ADD phone_number text;
Alternatively, you may create a new table to handle user posts:

.. code::
CREATE TABLE my_keyspace.user_posts (
user_id uuid,
post_id uuid PRIMARY KEY,
post_text text,
post_timestamp timestamp
);
However, there are certain choices you will need to make to get the most value
out of ScyllaDB as follows. This further reinforces the concept of adopting
a query-first data model.

**Data Types**

Selecting the appropriate data type for your columns is critical for both
physical storage and logical query performance in your data model. You will
need to consider factors such as data size, indexing, and sorting.

Let's say you're designing a table to store information about e-commerce
products, and one of the attributes you want to capture is the product's price.
The choice of data type for the "price" column is crucial for efficient storage
and query performance.

.. code::
CREATE TABLE my_keyspace.products (
product_id uuid PRIMARY KEY,
product_name text,
price decimal,
description text
);
In this example, for the "price" column, we've chosen the decimal data type.
This data type is suitable for storing precise numerical values, such as
prices, as it preserves decimal precision.
Choosing decimal over other numeric data types like float or double is
essential when dealing with financial data to avoid issues with rounding errors.

You can efficiently index and query prices using the decimal data type,
ensuring fast and precise searches for products within specific price ranges.
When you need to sort products by price, the decimal data type maintains the
correct order, even for values with different decimal precision.

**(De)Normalization**

The choice between normalization and denormalization will depend on your
specific use case. A good rule of thumb is that normalization reduces
redundancy but may require more complex queries, while denormalization
simplifies queries yet may increase storage requirements. It is important to
consider the tradeoff between approaches when designing your data model.

Let's consider a scenario where you are designing a data model to manage
information about a library system with two main entities: books and authors.
You have the flexibility to choose between normalized and denormalized approaches.

**Normalized Data Model**

In a normalized data model, you would have separate tables for books and
authors, reducing data redundancy:

.. code::
CREATE TABLE my_keyspace.authors (
author_id uuid PRIMARY KEY,
author_name text
);
CREATE TABLE my_keyspace.books (
book_id uuid PRIMARY KEY,
title text,
publication_year int,
author_id uuid,
ISBN text
);
In this normalized model, the authors table stores information about authors,
and the books table stores information about books. The ``author_id`` column
in the books table serves as a foreign key referencing the authors table,
ensuring data consistency and reducing redundancy.

**Denormalized Data Model**

In a denormalized data model, you would combine some data to simplify queries,
even though it may lead to redundancy:

.. code::
CREATE TABLE my_keyspace.books_and_authors (
book_id uuid PRIMARY KEY,
title text,
publication_year int,
author_name text,
ISBN text
);
In this denormalized model, the ``books_and_authors`` table combines
information from both ``books`` and ``authors`` into a single table.
The ``author_name`` column directly stores the author's name, eliminating
the need for foreign key references.

99 changes: 99 additions & 0 deletions docs/develop-with-scylladb/connect-apps.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
=======================
Connect an Application
=======================

To connect your application to ScyllaDB, you need to:

#. :doc:`Install the relevant driver </develop-with-scylladb/install-drivers>`
for your application language.

This step involves setting up a driver that is compatible with ScyllaDB.
The driver acts as the link between your application and ScyllaDB, enabling
your application to communicate with the database.

#. Modify your application code to connect the driver.

The following is some boilerplate code to help familiarize yourself with
connecting your application with the ScyllaDB driver. For a detailed
walkthrough of building a fictional media player application with code
examples, please see our
`Getting Started tutorial <https://cloud-getting-started.scylladb.com/stable/getting-started.html>`_.

.. tabs::

.. group-tab:: Rust

.. code-block:: rust
use anyhow::Result;in various languages
use scylla::{Session, SessionBuilder};
use std::time::Duration;
#[tokio::main]
async fn main() -> Result<()> {
let session: Session = SessionBuilder::new()
.known_nodes(&[
"localhost",
])
.connection_timeout(Duration::from_secs(30))
.user("scylla", "your-awesome-password")
.build()
.await
.unwrap();
Ok(())
}
.. group-tab:: Go

.. code-block:: go
func main() {
cluster := gocql.NewCluster("localhost")
cluster.Authenticator = gocql.PasswordAuthenticator{Username: "scylla", Password: "your-awesome-password"}
session, err := gocqlx.WrapSession(cluster.CreateSession())

if err != nil {
panic("Connection fail")
}
}



.. group-tab:: Java

.. code-block:: java
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.PlainTextAuthProvider;
import com.datastax.driver.core.Session;
class Main {
public static void main(String[] args) {
Cluster cluster = Cluster.builder()
.addContactPoints("localhost")
.withAuthProvider(new PlainTextAuthProvider("scylla", "your-awesome-password"))
.build();
Session session = cluster.connect();
}
}
.. group-tab:: Python

.. code-block:: python
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
cluster = Cluster(
contact_points=[
"localhost",
],
auth_provider=PlainTextAuthProvider(username='scylla', password='your-awesome-password')
)
21 changes: 21 additions & 0 deletions docs/develop-with-scylladb/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
========================
Develop with ScyllaDB
========================

Developing with ScyllaDB involves setting up the database environment,
choosing the appropriate drivers for your programming language, and
integrating it with your application.

* :doc:`Run ScyllaDB </develop-with-scylladb/run-scylladb>`
* :doc:`Install a Driver </develop-with-scylladb/install-drivers>`
* :doc:`Connect an Application </develop-with-scylladb/connect-apps>`
* :doc:`Tutorials and Example Projects </develop-with-scylladb/tutorials-example-projects>`


.. toctree::
:hidden:

run-scylladb
install-drivers
connect-apps
tutorials-example-projects
Loading

0 comments on commit 3cec8e4

Please sign in to comment.