Data.Engineers.Lunch

Resources from weekly Zoom lunches revolving around data engineering and data engineering-related topics. Hosted by Anant Corporation.

Join Data Engineer's Lunch Weekly at 12 PM EST Every Monday

Watch Data Engineer's Lunches Live and Subscribe to Our YouTube Channel to Keep Up to Date

If you would like to be a guest speaker, you can reach us at [email protected]. If you would like to sponsor Data Engineer's Lunch, please reach us at the email listed.

Check out the Data Engineer's Lunch playlist on Youtube

Number	Jump To Topic	YouTube	SlideShare
1	Data Engineering Roadmap	YouTube	SlideShare
2	Common ETL Frameworks	YouTube	SlideShare
3	Scripting Shell Automation for Data Engineering	YouTube	SlideShare
4	Airflow for Data Engineering	YouTube	SlideShare
5	What is a Data Lake	YouTube	SlideShare
6	Common Data Formats Used In Data Engineering	YouTube	SlideShare
7	SQL Databases	YouTube	SlideShare
8	SQL Databases Part 2	YouTube	SlideShare
9	Open Source & Cloud Data Catalog	YouTube	SlideShare
10	NoSQL Databases: Part 1	YouTube	SlideShare
11	Apache Spark Companion Technologies MLFlow	YouTube	SlideShare
12	Introduction to sed for Data Engineering	YouTube	SlideShare
13	Introduction to Airflow	YouTube	SlideShare
14	NoSQL Databases: Part 2 CAP Theorem	YouTube	SlideShare
15	Introduction to Jenkins	YouTube	SlideShare
16	Introduction to awk for Data Engineering	YouTube	SlideShare
17	NoSQL Databases: Part 3 Data Store Types	YouTube	SlideShare
18	Luigi for Scheduling	YouTube	SlideShare
19	Introduction to jq for Data Engineering	YouTube	SlideShare
20	DataOps vs. DevOps	YouTube	SlideShare
21	Python ETL Tools	YouTube	SlideShare
22	Prometheus	YouTube	SlideShare
23	Thanos/Cortex	YouTube	SlideShare
24	Pandas for Data Engineering	YouTube	SlideShare
25	Airflow and Spark	YouTube	SlideShare
26	Akka Actors for Data Processing	YouTube	SlideShare
27	Data Processing with Containers: Docker & Kubernetes Tools for Data Engineering	YouTube	SlideShare
28	Petl for Data Engineering	YouTube	SlideShare
29	Introduction to Apache Nifi	YouTube	SlideShare
30	Databand	YouTube	SlideShare
31	Migrating from PostgreSQL to Cassandra	YouTube	SlideShare
32	Converting JSON to CSV	YouTube	SlideShare
33	Using Spark, Cassandra, and Elasticsearch for Data Processing	YouTube	SlideShare
34	DBeaver	YouTube	SlideShare
35	Introduction to Snowflake	YouTube	SlideShare
36	Amundsen/DSE + Airflow	YouTube	SlideShare
37	Pipedream: Serverless Integration and Compute Platform	YouTube	SlideShare
39	Dapr Cloud	YouTube	SlideShare
40	Streaming Real Time vs Batch for ETL	YouTube	SlideShare
41	PygramETL	YouTube	SlideShare
42	Introduction to Databricks	YouTube	SlideShare
43	Bodo.ai - Karthik Narayanan	YouTube
44	Prefect	YouTube	SlideShare
45	Apache Livy	YouTube	SlideShare
46	Node.js and API calls	YouTube	SlideShare
47	Airflow on Kubernetes	YouTube	SlideShare
48	Veezoo - João Pedro Monteiro	YouTube
49	Meltano for Data Engineering	YouTube	SlideShare
50	Airbyte for Data Engineering	YouTube	SlideShare
51	Comparison of Managed Airflow Options	YouTube
52	JupyterHub/JupyterLab on Kubernetes	YouTube	SlideShare
53	2021 in Review	YouTube
54	dbt and Spark	YouTube	SlideShare
55	Get Started in Data Engineering	YouTube	SlideShare
56	Spring Cloud Data Flow with Cassandra	YouTube	SlideShare
57	StreamSets for Data Engineering	YouTube	SlideShare
58	InfinyOn	YouTube
59	Spark Tasks and Distribution	YouTube	SlideShare
60	Series - Developing Enterprise Consciousness	YouTube	SlideShare
61	Kubevirt	YouTube	SlideShare
63	Building a Cryptocurrency Data Catalogue	YouTube	SlideShare
64	Processing Real-time Crypto Transactions	YouTube
65	JanusGraph on Jupyter - Using Notebooks with Graph	YouTube
66	Airflow and Presto	YouTube	SlideShare
67	Machine Learning - Feature Selection	YouTube	SlideShare
68	DevOps Fundamentals	YouTube	SlideShare
69	Great Expectations for Data Engineering		SlideShare
70	Apache Iceberg	YouTube	SlideShare
71	Tools for Cloud Data Engineering	YouTube
72	Introduction to Apache Pinot	YouTube
74	Table Format Comparison	YouTube
75	Real-time change data capture processing and ingest into OLTP and OLAP databases	YouTube
76	Airflow and Google Dataproc	YouTube
77	Apache Arrow Flight SQL: A Universal Standard for High-Performance Data Transfers from Databases	YouTube
78	Visualize Data from Cassandra in Superset	YouTube
79	The Second 90% of Data Engineering Projects	YouTube
80	Apache Spark Resource Managers	YouTube
81	Reverse ETL Tools for Modern Data Platforms	YouTube
82	Automating Apache Cassandra Operations with Apache Airflow	YouTube
83	Strategies for Migration to Apache Iceberg - Alex Merced Dremio	YouTube
84	Interesting and Exciting Things from AWS re:Invent 2022	YouTube
85	Designing a Modern Data Stack	YouTube
86	Building Real-Time Applications at Scale: A Case Study in Cyclist Crash Detection	YouTube
87	ChatGPT for Data Engineering	YouTube
89	Machine Learning Orchestration with Airflow	YouTube
90	Migrating SQL Data with Arcion	YouTube
91	Deploying Google-managed Instance Groups with Terraform	YouTube
92	GCP Managed Instance Groups with Terraform Pt. 2	YouTube
93	LLM / AI Engineering for Software & Data Engineers	YouTube
94	Upgrading Postgres for On-Prem IoT	YouTube
95	Python Parallel Processing Frameworks	YouTube

Data Engineer's Lunch #1: Data Engineering Road-map

We cover the data engineering roadmap and the general path, which includes various technologies for programming, scripting/automation, databases, data processing, scheduling, clouds, and infrastructure. We also discuss different guides and resources.
- YouTube
- SlideShare

Data Engineer's Lunch #2: Common ETL Frameworks

We discuss common ETL frameworks and different tools and frameworks for different languages including Python, Java, Scala, .NET, and Node.
- YouTube
- SlideShare

Data Engineer's Lunch #3: Scripting / Shell Automation for Data Engineering

We discuss a multitude of tools you can use to do scripting and shell automation for data engineering along with different shells, cron, and various command-line tools with resources and examples.
- YouTube
- SlideShare

Data Engineer's Lunch #4: Airflow for Data Engineering

Guest speaker Will Angel covers the topic of using Airflow for data engineering. Airflow is a scheduling tool for managing data pipelines.
- YouTube
- SlideShare

Data Engineer's Lunch #5: What is a Data Lake?

We discuss what data lakes are, why we need them, how we get data in and out, and different implementations of data lakes.
- YouTube
- SlideShare

Data Engineer's Lunch #6: Common Data Formats Used in Data Engineering

We discuss common data formats used in data engineering including text/file and binary formats.
- YouTube
- SlideShare

Data Engineer's Lunch #7: SQL Databases

We discuss relational concepts including the history of RDBMS, the general need for SQL databases, rules of design, and normalization. We also discuss popular SQL databases, and their advantages and disadvantages.
- YouTube
- SlideShare

Data Engineer's Lunch #8: SQL Databases part 2

We continue our discussion of relational concepts, popular SQL databases, and advantages and disadvantages. We also discuss Cloud Databases and database tools compatible with SQL databases.
- YouTube
- SlideShare

Data Engineer's Lunch #9: Open Source & Cloud Data Catalog

We discuss data catalogs, which help users keep track of data.
- YouTube
- SlideShare

Data Engineer's Lunch #10: NoSQL Databases - Part 1

We discuss NoSQL datastores, specifically, different types of key-value stores.
- YouTube
- SlideShare

Data Engineer's Lunch #11: Apache Spark Companion Technologies: MLFlow

We cover MLFlow, a tool by Databricks for managing and cataloging machine learning workflows.
- YouTube
- SlideShare

Data Engineer's Lunch #12: Introduction to sed for Data Engineering

We will introduce sed, a stream editor, for data engineering. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline).
- YouTube
- SlideShare

Data Engineer's Lunch #13: Introduction to Airflow

We will cover some resources for getting started with Airflow, a python based scheduling tool with the ability to connect to a number of different data management tools. We had an overview recently from Will Angel in Data Engineers Lunch #4. This session will help beginners learn to use Airflow.
- YouTube
- SlideShare

Data Engineer's Lunch #14: NoSQL Databases Part 2 - CAP Theorem

We cover the fundamental difference between relational vs most non-relation databases with ACID vs Base.
- YouTube
- SlideShare

Data Engineer's Lunch #15: Introduction to Jenkins

We will cover the use of Jenkins as a scheduling tool, have a general overview of Jenkins capabilities, and a comparison of how it stacks up against Airflow as a scheduling tool.
- YouTube
- SlideShare

Data Engineer's Lunch #16: Introduction to awk for Data Engineering

We will introduce and demonstrate awk, a program that you can use to select particular records in a file and perform operations upon them.
- YouTube
- SlideShare

Data Engineer's Lunch #17: NoSQL Part 3: Data Store Types

We discussed the four different types of data stores that underlie NoSQL databases.
- YouTube
- SlideShare

Data Engineer's Lunch #18: Luigi For Scheduling

We discussed Luigi as a scheduling platforms alongside our previous discussions of Jenkins and Airflow. Luigi is a Python package that helps you build complex pipelines of batch jobs.
- YouTube
- SlideShare

Data Engineer's Lunch #19: Introduction to jq for Data Engineering

We introduce jq and how we can use it for data engineering. jq is a command-line tool like sed for JSON data and can be used to slice, filter, map, and transform structured data.
- YouTube
- SlideShare

Data Engineer's Lunch #20: DataOps vs. DevOps

We discuss the definitions and differences between DataOps (Data Operations) and DevOps (Dev Operations).
- YouTube
- SlideShare

Data Engineer's Lunch #21: Python ETL Tools

We discuss, compare, and contrast a number of ETL tools for Python.
- YouTube
- SlideShare

Data Engineer's Lunch #22: Prometheus

Guest speaker Will Angel covers the topic of using Prometheus for data engineering. Prometheus is a monitoring system & time series database.
- YouTube
- SlideShare

Data Engineer's Lunch #23: Thanos/Cortex

Rahul Singh covers the topics of Thanos and Cortex.
- YouTube
- SlideShare

Data Engineer's Lunch #24: Pandas for Data Engineering

We continue our discussion of Python ETL tools with a more in-depth look at Pandas.
- YouTube
- SlideShare

Data Engineer's Lunch #25: Airflow and Spark

We discuss how we can use Airflow to schedule Spark jobs.
- YouTube
- SlideShare

Data Engineer's Lunch #26: Akka Actors for Data Processing

We discuss how to use Akka Actors for concurrent data processing operations.
- YouTube
- SlideShare

Data Engineer's Lunch #27: Data Processing with Containers: Docker & Kubernetes Tools for Data Engineering

We discuss data processing with different container tools.
- YouTube
- SlideShare

Data Engineer's Lunch #28: Petl for Data Engineering

We continue our discussion of Python ETL tools with a more in-depth look at Petl.
- YouTube
- SlideShare

Data Engineer's Lunch #29: Introduction to Apache Nifi

We introduce Apache Nifi and discuss how we can use it for data engineering.
- YouTube
- SlideShare

Data Engineer's Lunch #30: Databand

In Data Engineer’s Lunch #30 we discuss the differences between the open-source and paid versions of Databand and have Databand CEO Josh Benamram walk us through a demo of the paid version.
- YouTube
- SlideShare

Data Engineer's Lunch #31: Migrating from PostgreSQL to Cassandra

In Data Engineer's Lunch #31, we will discuss the process and reasons for migrating your database from SQL(PostgreSQL) to NoSQL(Cassandra)
- YouTube
- SlideShare

Data Engineer's Lunch #32: Converting JSON to CSV

In Data Engineer's Lunch #32, we will discuss different ways to convert json files into csv files.
- YouTube
- SlideShare

Data Engineer's Lunch #33: Using Spark, Cassandra, and Elasticsearch for Data Processing

In Data Engineer's Lunch #33, we will discuss how you can use Spark and Spark jobs to load data from a csv file, and save + load the data into Cassandra and Elasticsearch.
- YouTube
- SlideShare

Data Engineer's Lunch #34: DBeaver

In Data Engineer's Lunch #34: DBeaver, we will be discussing what DBeaver is and how it can be used in data engineering.
- YouTube
- SlideShare

Data Engineer's Lunch #35: Introduction to Snowflake

In Data Engineer's Lunch #35: Introduction to Snowflake, we will introduce Snowflake and discuss how it can be used for Data Engineering.
- YouTube
- SlideShare

Data Engineer's Lunch #36: Amundsen/DSE + Airflow

In Data Engineer's Lunch #36, we will discuss data discovery with Amundsen.
- YouTube
- SlideShare

Data Engineer's Lunch #37: Pipedream: Serverless Integration and Compute Platform

In Data Engineer's Lunch #37, we will discuss Pipedream, a serverless integration and compute platform that is free for individual developers to use.
- YouTube
- SlideShare

Data Engineer's Lunch #39: Dapr Cloud

In Data Engineer's Lunch #39: Dapr Cloud we will discuss how to use Dapr to make a cloud Application
- YouTube
- SlideShare

Data Engineer's Lunch #40: Streaming Real Time vs Batch for ETL

In Data Engineer's Lunch #40: Streaming Real Time vs Batch for ETL, we will be discussing use cases for using real time stream processing or processing in batches.
- YouTube
- SlideShare

Data Engineer's Lunch #41: PygramETL

In Data Engineer's Lunch #41, we will discuss pygrametl as part of our discussion of python ETL tools.
- YouTube
- SlideShare

Data Engineer's Lunch #42: Introduction to Databricks

In Data Engineer's Lunch #42, we will introduce Databricks and how it can be used for data engineering.
- YouTube
- SlideShare

Data Engineer's Lunch #43: Bodo.ai

In Data Engineer's Lunch #43, Karthik Narayanan Principal Solutions Architect and Bodo.ai will be demonstrating what Bodo.ai is and its capabilities.
- YouTube

Data Engineer's Lunch #44: Prefect

In Data Engineer's Lunch #44, we will discuss Prefect and how it compares to Airflow when scheduling tasks.
- YouTube
- SlideShare

Data Engineer's Lunch #45: Apache Livy

In Data Engineer's Lunch #45, we will discuss the use of Apache Livy, which creates a REST API for interacting with Spark.
- YouTube
- SlideShare

Data Engineer's Lunch #46: Node.js and API calls

In Data Engineer's Lunch #46, we discuss the architecture of Node.js and use it to initiate and harvest some data from an API call.
- YouTube
- SlideShare

Data Engineer's Lunch #47: Airflow on Kubernetes

In Data Engineer's Lunch #47, we will use Kubernetes to deploy airflow
- YouTube
- SlideShare

Data Engineer's Lunch #48: Veezoo - João Pedro Monteiro

In Data Engineer's Lunch #48, João Pedro Monteiro (JP), co-founder and CTO of Veezoo, will be introducing Veezoo and showing how natural language interfaces are the key to enabling data democratization at companies.
- YouTube

Data Engineer's Lunch #49: Meltano for Data Engineering

In Data Engineer's Lunch #49, we will be introducing Meltano and how it can be used for ELT in data engineering.
- YouTube
- SlideShare

Data Engineer's Lunch #50: Airbyte for Data Engineering

In Data Engineer's Lunch #50, we will introduce Airbyte and discuss how it can be used for data engineering
- YouTube
- SlideShare

Data Engineer's Lunch #51: Comparison of Managed Airflow Options

In Data Engineer's Lunch #51: Comparison of Managed Airflow Options, guest speaker Andres Namm will be comparing AWS Airflow, GCP Airflow, Astronomer vs. self-managed Airflow.
- YouTube

Data Engineer's Lunch #52: JupyterHub/JupyterLab on Kubernetes

In Data Engineer's Lunch #52 we will deploy JupyterHub/JupyterLab on Kubernetes
- YouTube
- SlideShare

Data Engineer's Lunch #53: 2021 in Review

In Data Engineer's Lunch #53, we discussed some of our most popular webinars from 2021 and received feedback from the audience about what they would like to see in 2022.
- YouTube

Data Engineer's Lunch #54: dbt and Spark

In Data Engineer's Lunch #54, we will discuss the data build tool, a tool for managing data transformations with config files rather than code. We will be connecting it to Apache Spark and using it to perform transformations.
- YouTube
- SlideShare

Data Engineer's Lunch #55: Get Started in Data Engineering

In Data Engineer's Lunch #55, CEO of Anant, Rahul Singh, will cover 10 resources every data engineer needs to get started or master their game.
- YouTube
- SlideShare

Data Engineer's Lunch #56: Spring Cloud Data Flow with Cassandra

In Data Engineer's Lunch #55 we will be going over how to integrate Spring Cloud Data Flow with Cassandra.
- YouTube
- SlideShare

Data Engineer's Lunch #57: StreamSets for Data Engineering

In Data Engineer's Lunch #57, we will discuss StreamSets and how it can be used for data engineering.
- YouTube
- SlideShare

Data Engineer's Lunch #58: InfinyOn

In Data Engineer’s Lunch #58, Sehyo Chang, founder and CTO of InfinyOn, will give an introduction to Fluvio OSS and the InfinyOn Cloud data streaming platform.
- YouTube

Data Engineer's Lunch #59: Spark Tasks and Distribution

In Data Engineer's Lunch #59, we will discuss the way that Spark splits up and distributes work between nodes. We will look at some example code and view in the Spark UI, how it was distributed between nodes.
- YouTube
- SlideShare

Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness

In Data Engineer's Lunch #60, CEO of Anant, Rahul Singh, will discuss modern data processing / pipeline approaches. Want to learn about modern data engineering patterns & practices for global data platforms? High-level overview of different types, frameworks, and workflows in data processing and pipeline design.
- YouTube
- SlideShare

Data Engineer's Lunch #61: Kubevirt

In Data Engineer's Lunch #61, Stefan Nikolovski will discuss Kubevirt.
- YouTube
- SlideShare

Data Engineer's Lunch #63: Building a Cryptocurrency Data Catalogue

In Data Engineer’s Lunch #63, Travis Collins, founder of the open source project DataPM, will present DataPM, how to get access to cryptocurrency, and blockchain data. This is part 1 of a series with Decodable on processing real-time crypto transactions fed by DataPM.
- YouTube
- SlideShare

Data Engineer's Lunch #64: Processing Real-time Crypto Transactions

In Data Engineer’s Lunch #64, Eric Sammer, CEO of Decodable, will discuss their cloud-based streaming SQL engine and how to mine insights from data in real-time. This is part 2 of a series with DataPM on processing real-time crypto transactions fed by DataPM.
- YouTube

Data Engineer's Lunch #65: JanusGraph on Jupyter - Using Notebooks with Graph

In Data Engineer's Lunch #65, Ryan Quey will discuss the Graph Notebook tool put out by the AWS team on JanusGraph.
- YouTube

Data Engineer's Lunch #66: Airflow and Presto

In Data Engineer's Lunch #66, Arpan Patel will discuss how to connect Airflow and Presto
- YouTube
- SlideShare

Data Engineer's Lunch #67: Machine Learning - Feature Selection

In Data Engineer's Lunch #67, Obioma Anomnachi will discuss the process of feature selection as part of a Machine Learning process. Feature selection describes the process of picking particular, relevant data features out of a wider data set, to be used to perform model training.
- YouTube
- SlideShare

Data Engineer's Lunch #68: DevOps Fundamentals

In Data Engineer’s Lunch #68, Will Angel, Technical Product Manager at Caribou Financial, will provide an introduction to DevOps practices and tooling including testing, deployment automation, logging, monitoring, and DevOps principles. Additionally, we will discuss some of the ways that DevOps for data engineering is different from conventional application development.
- YouTube
- SlideShare

[Data Engineer's Lunch #69: Great Expectations for Data Engineering]

In Data Engineer's Lunch #69, Arpan Patel will discuss Great Expectations and how it can be used for data engineering. This will be part one of a series on Great Expectations and will primarily focus on introducing Great Expectations. Future talks will feature tools like Spark and Airflow in conjunction with Great Expectations!
- SlideShare

Data Engineer's Lunch #70: Apache Iceberg

In Data Engineer's Lunch #70, watch Alex Merced, Developer Advocate at Dremio, for this webinar to learn the architectural details of why the Hive table format falls short and why the Iceberg table format resolves them, as well as the benefits that stem from Iceberg’s approach.
- YouTube
- SlideShare

Data Engineer's Lunch #71: Tools for Cloud Data Engineering

In Data Engineer’s Lunch #71, CEO of Anant, Rahul Singh, will discuss tools for cloud data engineering!
- YouTube

Data Engineer's Lunch #72: Introduction to Apache Pinot

In Data Engineer’s Lunch #72, CEO of Anant, Rahul Singh, will give an overview of the up-and-coming Apache Pinot project that spun out of LinkedIn and is now being supported by Startree as an enterprise offering. This is the first in a series of talks and workshops on why Pinot is important to the future of real-time data
- YouTube

Data Engineer's Lunch #74: Table Format Comparison

In Data Engineer's Lunch #74, Alex Merced, Developer Advocate for Dremio, will discuss the three major data lake table formats – Apache Iceberg, Apache Hudi, and Delta Lake – covering how they work, their features, and their limitations so you can make an informed decision when architecting your data lakehouse.
- YouTube

Data Engineer's Lunch #75: Real-time change data capture, processing, and ingest into OLTP and OLAP databases

In Data Engineer's Lunch #75, Eric Sammer, CEO of Decodable, will discuss real-time change data capture, processing, and ingest into OLTP and OLAP databases!
- YouTube

Data Engineer's Lunch #76: Airflow and Google Dataproc

In Data Engineer's Lunch #76, Arpan Patel will cover how to connect Airflow and Dataproc with a demo using an Airflow DAG to create a Dataproc cluster, submit an Apache Spark job to Dataproc, and destroy the Dataproc cluster upon completion.
- YouTube

Data Engineer's Lunch #77: Apache Arrow Flight SQL: A Universal Standard for High-Performance Data Transfers from Databases

This talk covers why ODBC & JDBC don’t cut it in today’s data world and the problems solved by Arrow, Arrow Flight, and Arrow Flight SQL. Alex will go through how each of these building blocks works as well as an overview of universal ODBC & JDBC drivers built on Arrow Flight SQL, enabling clients to take advantage of this increased performance with zero application changes.
- YouTube

Data Engineer's Lunch #78: Visualize Data from Cassandra in Superset

In this lunch, Ryan will walk through how to visualize data from Cassandra in Superset (by means of Presto). Along the way, he shares some observations about his experience and potential use cases that may be interesting to you.
- YouTube

Data Engineer's Lunch #79: Data Governance: The Second 90% of Data Engineering Projects

You build an ELT pipeline to get data from some source, load it into your data lake, and transform it into a usefully modeled dataset for analysts and business users to consume; another data engineering job well done. Except you now have a new set of data artifacts, access patterns, documentation (hopefully), and security permissions to manage. This talk will provide an overview of Data Governance, which is the art of anticipating, preventing, and mitigating all the risks, costs, and headaches that come with every new data source throughout the data lifecycle.
- YouTube

Data Engineer's Lunch #80: Apache Spark Resource Managers

In Data Engineer's Lunch #80, Obioma Anomnachi will compare and contrast the different resource managers available for Apache Spark. We will cover local, standalone, YARN, and Kubernetes resource managers and discuss how each one allows the user different levels of control over how resources given to Spark are distributed to Spark applications.
- YouTube

Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms

During this lunch, we’ll review some of the open-source reverse ETL tools to uncover how to send data back to SaaS systems.
- YouTube

Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache Airflow

During this lunch, we’ll discuss going beyond cron jobs to manage ETL, Data Hygiene, and Data Import/Export.
- YouTube

Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg

In this talk, Dremio Developer Advocate, Alex Merced, discusses strategies for migrating your existing data over to Apache Iceberg. He'll go over the following: How to Migrate Hive, Delta Lake, JSON, and CSV sources to Apache Iceberg, Pros and Cons of an In-place or Shadow Migration, Migrating between Apache Iceberg catalogs Hive/Glue -- Arctic/Nessie
- YouTube

Data Engineer's Lunch #84: Interesting and Exciting Things from AWS re:Invent 2022

In this lunch, Nicholas will deliver a breakdown and description of AWS re:Invent 2022 and some of the cool announcements and learning that occurred there.
- YouTube

Data Engineer's Lunch #85: Designing a Modern Data Stack

What are the design considerations that go into architecting a modern data warehouse? This lunch will cover some of the requirements analysis, design decisions, and execution challenges of building a modern data lake/data warehouse.
- YouTube

Data Engineer's Lunch #86: Building Real-Time Applications at Scale

As the demand for real-time data processing continues to grow, so too do the challenges associated with building production-ready applications that can handle large volumes of data and handle it quickly. In this talk, we will explore common problems faced when building real-time applications at scale, with a focus on a specific use case: detecting and responding to cyclist crashes. Using telemetry data collected from a fitness app, we’ll demonstrate how we used a combination of Apache Kafka and Python-based microservices running on Kubernetes to build a pipeline for processing and analyzing this data in real-time. We'll also discuss how we used machine learning techniques to build a model for detecting collisions and how we implemented notifications to alert family members of a crash. Our ultimate goal is to help you navigate the challenges that come with building data-intensive, real-time applications that use ML models. By showcasing a real-world example, we aim to provide practical solutions and insights that you can apply to your own projects.
- YouTube

Data Engineer's Lunch #87: ChatGPT for Data Engineering

Learn how to use ChatGPT for basic data engineering tasks such as data modeling, ETL, data cleanup, code conversion, and data science.
- YouTube

Data Engineer's Lunch #89: Machine Learning Orchestration with Airflow

In Data Engineer's Lunch 89, Obioma Anomnachi will discuss how to manage and schedule Machine Learning operations via Airflow. Learn how you can write complete end-to-end pipelines starting with retrieving raw data to serving ML predictions to end-users, entirely in Airflow.
- YouTube

Data Engineer's Lunch #90: Migrating SQL Data with Arcion

If you're looking to migrate SQL data for your organization, you won't want to miss this informative talk on Arcion. Designed to simplify the data migration process, Arcion offers a seamless solution that enables you to move your SQL data quickly and efficiently. During this talk, you'll discover the many benefits of using Arcion for data migration, including its intuitive interface and powerful automation features. You'll also learn how to leverage Arcion's robust capabilities to streamline your SQL data migration process, regardless of the size or complexity of your data sets.
- YouTube

Data Engineer's Lunch #91: Deploying Google-managed Instance Groups with Terraform

In this lunch, we'll show you how to deploy a Managed Instance Group using Terraform. We'll explain the different methods and demonstrate configurations, lessons learned, and a simple example for deploying a Managed Instance Group with Terraform in GCP.
- YouTube

Data Engineer's Lunch #92: GCP Managed Instance Groups with Terraform Pt. 2

In the second in a series on Google Managed Instance Groups, Anant Architect Nicholas Brackley details methods for implementing auto-healing and updating the image used by the group.
- YouTube

Data Engineer's Lunch #93: LLM / AI Engineering for Software & Data Engineers

During this lunch, we'll cover what you need to know to get started in LLM / GPT engineering as a Software and/or Data Engineer. This lunch covers the fundamentals of LLM, some patterns, and how to get started.
- YouTube

Data Engineer's Lunch #94: Upgrading Postgres for On-Prem IoT

Join Will Angel for a talk about his journey upgrading an on-prem IoT system's Postgres database and some of the techniques used to improve local database performance for analytical systems.
- YouTube

Data Engineer's Lunch #95: Python Parallel Processing Frameworks

In Data Engineer's Lunch 94, Obioma Anomnachi will be sharing his expertise on the topic of parallel computing for Python programmers. During the event, Obioma will delve into the various pathways available for Python developers who wish to execute their code in parallel. You will learn about the benefits of parallel processing, how it can improve the performance of your code, and the different tools and frameworks that can be used to achieve this.
- YouTube

Data Engineer's Lunch #96: Intro to Real-Time Analytics Using Apache Pinot

In this lunch, we will introduce the concepts of Real Time Analytics, why it is important, the evolution of Analytics, and how companies such as LinkedIn, Stripe, Uber, and more are using Real Time analytics to grow their audience and improve usability by using Apache Pinot. What is Apache Pinot? Followed by Demo and Q&A.
- YouTube

Data Engineer's Lunch #97: Apache HDFS: Hadoop Distributed File System

In this lunch we will discuss the use of the Hadoop Distributed File System for data engineering applications.
- YouTube

Data Engineer's Lunch #98: The Who, What, and Why of Data Lake Table Formats

A comprehensive exploration of the intricacies of Data Lake Table Formats and their impact on business analytics. Data lake table formats are a critical component of modern data analytics. By the end of this presentation, you will better understand data lake table formats and how they can be used to improve business analytics.
- YouTube

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
README.md		README.md

Anant/Data.Engineers.Lunch

Folders and files

Latest commit

History

Repository files navigation

Data.Engineers.Lunch

Table of Contents

[Data Engineer's Lunch #69: Great Expectations for Data Engineering]

About

Topics

Resources

Stars

Watchers

Forks