Skip to content

hiepntp/notes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

Notes

Notes for tools, knowledge and concepts about Data Engineering and and some other roles (DevOps, DA, ...)

Table of Contents

Terms & Concepts

Data Engineering is a set of operations aimed at creating interfaces and mechanisms for the flow and access of data. It is a field that is responsible for the architecture that brings data from one place to another, and it is also responsible for working with data and making it available to data scientists for analysis.

  • Data Pipeline: A data pipeline is a series of data processing elements connected in series, where the output of one element is the input of the next. Elements can include sources, processors, sinks, and storage.
  • ETL: ETL stands for Extract, Transform, Load. It is a process in data warehousing responsible for pulling data out of the source systems and placing it into a data warehouse.
  • Data Lake: A data lake is a system or repository of data stored in its natural/raw format, usually object blobs or files. A data lake is usually a single store of all enterprise data including raw copies of source system data and transformed data used for tasks such as reporting, visualization, advanced analytics, and machine learning.
  • Data Warehouse: A data warehouse is a system that stores data from multiple sources and transforms it into a format that analysts and business intelligence tools can use to perform complex queries and analysis.
  • Data Mart: A data mart is a subset of a data warehouse that is designed for a particular line of business, such as sales, marketing, or finance.
  • Data Lakehouse: A data lakehouse is a new data management paradigm that combines the best of data warehouses and data lakes. It provides the scalability and flexibility of a data lake with the performance and reliability of a data warehouse.
  • Data Modeling: Data modeling is the process of creating a data model for an information system by applying formal data modeling techniques.
  • Data Integration: Data integration is the process of combining data from different sources into a single, unified view.
  • Data Transformation: Data transformation is the process of converting data from one format or structure into another format or structure.
  • Data Ingestion: Data ingestion is the process of bringing data from external sources into a data storage system.
  • Data Analysis: Data analysis is the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

Tools

The nearly full list of tools for Data Engineering is mentioned in this repo

Blogs

References

About

Useful notes for Data Engineers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published