Skip to content

Commit

Permalink
create placeholder for module 4 join and initial content for module 5…
Browse files Browse the repository at this point in the history
… Slowly Changing Dimention
  • Loading branch information
Hao Zhang authored and HaoZhang615 committed Jul 1, 2023
1 parent d31a893 commit 579f159
Show file tree
Hide file tree
Showing 2 changed files with 0 additions and 68 deletions.
12 changes: 0 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,6 @@ Mapping data flows provide an entirely visual experience with no coding required
1. [Create Integration Runtime](./modules/module01.md)
2. [Create Linked Services](./modules/module02.md)
3. [Two Ways to do a Basic Copy](./modules/module03.md)
4. [Join Placeholder](./modules/module04.md)
5. [Slowly Changing Dimensions](./modules/module05.md)
<<<<<<< HEAD
6. Change Data Capture Storage to SQL (module planned)
7. [Medallion Architecture: Bronze Layer](./modules/module07.md)
8. [Medallion Architecture: Silver Layer](./modules/module08.md)
9. [Medallion Architecture: Gold Layer](./modules/module09.md)
10. [Medallion Architecture: Consumption Layer](./modules/module10.md)
11. [Troubleshooting](./modules/module11.md)
12. [Best Practices](./modules/module12.md)
=======
>>>>>>> 1dff1a1 (create placeholder for module 4 join and initial content for module 5 Slowly Changing Dimention)

## :books: Optional Learning Modules

Expand Down
56 changes: 0 additions & 56 deletions modules/module05.md
Original file line number Diff line number Diff line change
@@ -1,58 +1,2 @@
# Module 05 - Slowly Changing Dimensions

[< Previous Module](../modules/module04.md) - **[Home](../README.md)** - [Next Module >](../modules/module05.md)

## :loudspeaker: Introduction

according to [Wikipedia](https://en.wikipedia.org/wiki/Slowly_changing_dimension): a slowly changing dimension (SCD) in data management and data warehousing is a dimension which contains relatively static data which can change slowly but unpredictably, rather than according to a regular schedule. Typical examples are the family name change of an employee after marrige and address change of a customer, which all happens unpredictably.

Based on how to deal with a slowly dimensional data change in different scenarios, there are several SCD Types. The most frequently used are SCD Type 1 and SCD Type 2, which will be introduced in this module together with the implementation pattern in Azure Data Factory Mapping Data Flow.


* Slowly Changing Dimensions Type 1 (overwrite without tracking history)
* Slowly Changing Dimensions Type 2 (track history by adding new row)

## :bookmark_tabs: Table of Contents

| # | Section |
| --- | --- |
| 1 | [Introduction SCD Type 1 and SCD Type 2](#1-introduction-scd-type-1-and-scd-type-2) |
| 2 | [Set up the linked service and dataset](#2-set-up-the-linked-service-and-dataset) |
| 3 | [Implement SCD Type 1 transformation with MDF](#3-implement-sdc-type-1-transformation-with-MDF) |
| 4 | [Implement SCD Type 2 transformation with MDF](#4-implement-sdc-type-2-transformation-with-MDF) |

<div align="right"><a href="#module-05---slowly-changing-dimensions">↥ back to top</a></div>

## 1. Introduction SCD Type 1 and SCD Type 2

### SCD Type 1
A **SCD Type 1** always reflects the latest values, and when changes in source data are detected, the dimension table data is overwritten. This design approach is common for columns that store supplementary values, like the email address or phone number of a customer. When a customer email address or phone number changes, the dimension table updates the customer row with the new values. It's as if the customer always had this contact information. The key field, such as CustomerID, would stay the same so the records in the fact table automatically link to the updated customer record.

<kbd> <img src="../images/module05/slowly-changing-dimensions-type-1-change.png" alt="example of SCD Type 1" /> </kbd>

### SCD Type 2
A **SCD Type 2** supports versioning of dimension members. Often the source system doesn't store versions, so the data warehouse load process detects and manages changes in a dimension table. In this case, the dimension table must use a surrogate key to provide a unique reference to a version of the dimension member. It also includes columns that define the date range validity of the version (for example, StartDate and EndDate) and possibly a flag column (for example, IsCurrent) to easily filter by current dimension members.

For example, Adventure Works assigns salespeople to a sales region. When a salesperson relocates region, a new version of the salesperson must be created to ensure that historical facts remain associated with the former region. To support accurate historic analysis of sales by salesperson, the dimension table must store versions of salespeople and their associated region(s). The table should also include start and end date values to define the time validity. Current versions may define an empty end date (or 12/31/9999), which indicates that the row is the current version. The table must also define a surrogate key because the business key (in this instance, employee ID) won't be unique.

<kbd> <img src="../images/module05/slowly-changing-dimensions-type-2-change.png" alt="example of SCD Type 2" /> </kbd>

## 2. Prepare the dataset
if you have successfully completed the [module 2](../modules/module02.md), you should already have the Azure Data Factory with the read and write access granted to the Azure SQL DB.

In the Azure SQL DB you will already find an "Adventure Works" database pre-deployed and filled with data ready to be queried (a test query on the [SalesLT].[Customer] table executed in the **Query Editor** showed below).

<kbd> <img src="../images/module05/test-query-azuresqldb.png" alt="example of SCD Type 2" /> </kbd>

This database represents a typical OLTP (Online Transactional Processing) database, which is optimized for fast data insertion and retrieval. For analytical and reporting purposes it is always recommended to have a OLAP (Online Analytical Processing) database (which usually forms into a Datawarehouse) in place, that offloads the analytical workloads on the same data sources (by syncing and persisting required data from the OLTP databases) and optimizes the query performance for **read** queries on large amount of rows.

In this module we will use the [SalesLT].[Customer] table as source data for the SCD Type 1 and [SalesLT].[Product] table as source data for SCD Type 2 transformation.

## 3. Slowly-Changing-Dimension 1

adfasdfasdf


## 4. Slowly-Changing-Dimension 2

asdfasdfasdf

0 comments on commit 579f159

Please sign in to comment.