diff --git a/README.md b/README.md index 2c6050b..d0b4e2a 100644 --- a/README.md +++ b/README.md @@ -28,18 +28,6 @@ Mapping data flows provide an entirely visual experience with no coding required 1. [Create Integration Runtime](./modules/module01.md) 2. [Create Linked Services](./modules/module02.md) 3. [Two Ways to do a Basic Copy](./modules/module03.md) -4. [Join Placeholder](./modules/module04.md) -5. [Slowly Changing Dimensions](./modules/module05.md) -<<<<<<< HEAD -6. Change Data Capture Storage to SQL (module planned) -7. [Medallion Architecture: Bronze Layer](./modules/module07.md) -8. [Medallion Architecture: Silver Layer](./modules/module08.md) -9. [Medallion Architecture: Gold Layer](./modules/module09.md) -10. [Medallion Architecture: Consumption Layer](./modules/module10.md) -11. [Troubleshooting](./modules/module11.md) -12. [Best Practices](./modules/module12.md) -======= ->>>>>>> 1dff1a1 (create placeholder for module 4 join and initial content for module 5 Slowly Changing Dimention) ## :books: Optional Learning Modules diff --git a/modules/module05.md b/modules/module05.md index e47455e..542af9e 100644 --- a/modules/module05.md +++ b/modules/module05.md @@ -1,58 +1,2 @@ -# Module 05 - Slowly Changing Dimensions - -[< Previous Module](../modules/module04.md) - **[Home](../README.md)** - [Next Module >](../modules/module05.md) - -## :loudspeaker: Introduction - -according to [Wikipedia](https://en.wikipedia.org/wiki/Slowly_changing_dimension): a slowly changing dimension (SCD) in data management and data warehousing is a dimension which contains relatively static data which can change slowly but unpredictably, rather than according to a regular schedule. Typical examples are the family name change of an employee after marrige and address change of a customer, which all happens unpredictably. - -Based on how to deal with a slowly dimensional data change in different scenarios, there are several SCD Types. The most frequently used are SCD Type 1 and SCD Type 2, which will be introduced in this module together with the implementation pattern in Azure Data Factory Mapping Data Flow. - - -* Slowly Changing Dimensions Type 1 (overwrite without tracking history) -* Slowly Changing Dimensions Type 2 (track history by adding new row) - -## :bookmark_tabs: Table of Contents - -| # | Section | -| --- | --- | -| 1 | [Introduction SCD Type 1 and SCD Type 2](#1-introduction-scd-type-1-and-scd-type-2) | -| 2 | [Set up the linked service and dataset](#2-set-up-the-linked-service-and-dataset) | -| 3 | [Implement SCD Type 1 transformation with MDF](#3-implement-sdc-type-1-transformation-with-MDF) | -| 4 | [Implement SCD Type 2 transformation with MDF](#4-implement-sdc-type-2-transformation-with-MDF) | - -
↥ back to top
- -## 1. Introduction SCD Type 1 and SCD Type 2 - -### SCD Type 1 -A **SCD Type 1** always reflects the latest values, and when changes in source data are detected, the dimension table data is overwritten. This design approach is common for columns that store supplementary values, like the email address or phone number of a customer. When a customer email address or phone number changes, the dimension table updates the customer row with the new values. It's as if the customer always had this contact information. The key field, such as CustomerID, would stay the same so the records in the fact table automatically link to the updated customer record. - - example of SCD Type 1 - -### SCD Type 2 -A **SCD Type 2** supports versioning of dimension members. Often the source system doesn't store versions, so the data warehouse load process detects and manages changes in a dimension table. In this case, the dimension table must use a surrogate key to provide a unique reference to a version of the dimension member. It also includes columns that define the date range validity of the version (for example, StartDate and EndDate) and possibly a flag column (for example, IsCurrent) to easily filter by current dimension members. - -For example, Adventure Works assigns salespeople to a sales region. When a salesperson relocates region, a new version of the salesperson must be created to ensure that historical facts remain associated with the former region. To support accurate historic analysis of sales by salesperson, the dimension table must store versions of salespeople and their associated region(s). The table should also include start and end date values to define the time validity. Current versions may define an empty end date (or 12/31/9999), which indicates that the row is the current version. The table must also define a surrogate key because the business key (in this instance, employee ID) won't be unique. - - example of SCD Type 2 - -## 2. Prepare the dataset -if you have successfully completed the [module 2](../modules/module02.md), you should already have the Azure Data Factory with the read and write access granted to the Azure SQL DB. - -In the Azure SQL DB you will already find an "Adventure Works" database pre-deployed and filled with data ready to be queried (a test query on the [SalesLT].[Customer] table executed in the **Query Editor** showed below). - - example of SCD Type 2 - -This database represents a typical OLTP (Online Transactional Processing) database, which is optimized for fast data insertion and retrieval. For analytical and reporting purposes it is always recommended to have a OLAP (Online Analytical Processing) database (which usually forms into a Datawarehouse) in place, that offloads the analytical workloads on the same data sources (by syncing and persisting required data from the OLTP databases) and optimizes the query performance for **read** queries on large amount of rows. - -In this module we will use the [SalesLT].[Customer] table as source data for the SCD Type 1 and [SalesLT].[Product] table as source data for SCD Type 2 transformation. - -## 3. Slowly-Changing-Dimension 1 - -adfasdfasdf - - -## 4. Slowly-Changing-Dimension 2 asdfasdfasdf \ No newline at end of file