Skip to content

Commit

Permalink
add new sections to schema design
Browse files Browse the repository at this point in the history
  • Loading branch information
dimitri-yatsenko committed Jan 12, 2025
1 parent 1ff74b6 commit 6b4ba7e
Show file tree
Hide file tree
Showing 8 changed files with 7,996 additions and 191 deletions.
64 changes: 44 additions & 20 deletions book/30-schema-design/010-schema.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,43 +8,57 @@
"title: Create Schemas\n",
"authors:\n",
" - name: Dimitri Yatsenko\n",
" - date: 2025-01-12\n",
"---\n",
"\n",
"# Create Schemas\n",
"\n",
"## What is a schema?\n",
"\n",
"We use the word \"schema\" in a couple of different but related ways.\n",
"The term schema has two related meanings in the context of databases:\n",
"\n",
"Firstly, a **schema** is a formal specification of the data structure and of the rules governing its integrity.\n",
"The schema serves as a blueprint that defines how data is organized, stored, and accessed within the database, ensuring that the data reflects the underlying business or research project rules it supports.\n",
"In strucuted data models, such as the relational model, schemas \n",
"In the relational data model, the schema provides a robust framework for defining the rules and constraints that govern data operations, helping to maintain consistency, accuracy, and reliability.\n",
"### 1. Schema as a Data Blueprint\n",
"A **schema** is a formal specification of the structure of data and the rules governing its integrity.\n",
"It serves as a blueprint that defines how data is organized, stored, and accessed within a database.\n",
"This ensures that the database reflects the rules and requirements of the underlying business or research project it supports.\n",
"\n",
"In addition to ensuring data integrity, good schema design optimizes the ease and efficiency of data queries. \n",
"A well-designed schema facilitates fast and accurate data retrieval, supports more complex queries, and allows the database to scale as data volumes grow.\n",
"In structured data models, such as the relational model, a schema provides a robust framework for defining:\n",
"* The structure of tables (relations) and their attributes (columns).\n",
"* Rules and constraints that ensure data consistency, accuracy, and reliability.\n",
"* Relationships between tables, such as primary keys (unique identifiers for records) and foreign keys (references to related records in other tables).\n",
"\n",
"Relational schema design involves defining a set of tables, each with columns (attributes) of specific data types. These tables are linked by primary keys, which uniquely identify each record, and foreign keys, which establish relationships between entities in different tables. Additional constraints, such as uniqueness constraints and indexes, further refine the schema by enforcing rules that prevent data duplication and by improving query performance. Default values can also be specified for certain attributes, ensuring that missing or optional data is handled consistently.\n",
"#### Aims of Good Schema Design\n",
"* **Data Integrity**: Ensures consistency and prevents anomalies.\n",
"* **Query Efficiency**: Facilitates fast and accurate data retrieval, supports complex queries, and optimizes database performance.\n",
"* **Scalability**: Allows the database to grow and adapt as data volumes increase.\n",
"\n",
"Through schema design, database architects ensure that the database not only meets the current needs of the organization but also remains flexible and scalable as those needs evolve. The schema acts as a living document that guides the database’s structure, supports efficient data operations, and upholds the integrity of the data it manages.\n",
"#### Key Elements of Schema Design\n",
"* **Tables and Attributes**: Each table is defined with specific attributes (columns), each assigned a data type.\n",
"* **Primary Keys**: Uniquely identify each record in a table.\n",
"* **Foreign Keys**: Establish relationships between entities in tables.\n",
"* **Indexes**: Support efficient queries.\n",
"\n",
"In DataJoint, declaring individual tables is the foundational step in building your data pipeline. Each table corresponds to a specific entity or data structure that you want to model within your database. This tutorial will guide you through the basics of declaring individual tables, covering essential components like primary keys, attributes, and basic definitions.\n",
"Through careful schema design, database architects create systems that are both efficient and flexible, meeting the current and future needs of an organization. The schema acts as a living document that guides the structure, operations, and integrity of the database.\n",
"\n",
"### 2. Schema as a Database Module\n",
"\n",
"The second meaning of the word \"schema\" is for a single module in a complex database design. \n",
"Complex databases can be separated into modules that serve as namespaces for related tables.\n",
"Thus a database may comprise multiple schemas.\n",
"We discuss multi-schema designs in a separate section."
"In complex database designs, the term \"schema\" is also used to describe a distinct module of a larger database with its own namespace that groups related tables together. \n",
"This modular approach:\n",
"* Separates tables into logical groups for better organization.\n",
"* Avoids naming conflicts in large databases with multiple schemas.\n",
"\n",
"For more details on designing multi-schema databases, refer to the section on multi-schema designs."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Schema Declaration\n",
"Before declaring tables, you need to declare a schema which is a namespace for your tables, giving it a unique name.\n",
"# Declaring a schema\n",
"Before you can create tables, you must declare a schema to serve as a namespace for your tables.\n",
"Each schema requires a unique name to distinguish it within the database.\n",
"\n",
"The schema groups related tables together and avoids naming conflicts."
"Here’s how to declare a schema in DataJoint:"
]
},
{
Expand Down Expand Up @@ -72,9 +86,19 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Using the Schema Object\n",
"# Using the `schema` Object\n",
"\n",
"The schema object groups related tables together and helps prevent naming conflicts.\n",
"\n",
"By convention, the object created by `dj.Schema` is named `schema`. Typically, only one schema object is used in any given Python namespace, usually at the level of a Python module.\n",
"\n",
"The schema object serves multiple purposes:\n",
"* **Creating Tables**: Used as a *class decorator* (`@schema`) to declare tables within the schema. \n",
"For details, see the next section, [Create Tables](010-table.ipynb)\n",
"* **Visualizing the Schema**: Generates diagrams to illustrate relationships between tables.\n",
"* **Exporting Data**: Facilitates exporting data for external use or backup.\n",
"\n",
"The `schema` object declared above\n"
"With this foundation, you are ready to begin declaring tables and building your data pipeline."
]
},
{
Expand All @@ -86,7 +110,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Dropping a schema\n",
"# Dropping a Schema\n",
"\n",
"Dropping a schema in DataJoint involves permanently deleting all the tables within that schema and the schema itself from the database. This is a significant action because it removes not only the tables but also all the data stored within those tables. To drop a schema, you typically use the `schema.drop()` method, where schema is the schema object you defined earlier in your code. \n",
"\n",
Expand Down
119 changes: 0 additions & 119 deletions book/30-schema-design/032-schema-modules.ipynb

This file was deleted.

Loading

0 comments on commit 6b4ba7e

Please sign in to comment.