Simple end to end data streaming

Kafka Topic -> Spark Streaming (Window()) = Data Aggregated -> Cassandra -> BackEnd (Websocket) -> Dashboard UI

Introduction

This repository contains the source code and configuration for a real-time data processing pipeline that aggregates data from a Kafka topic, performs real-time analytics using Spark Streaming, stores the aggregated data in Cassandra, and updates a dashboard UI in real-time through a WebSocket connection.

Architecture

Components

1. Kafka Topic

Data is ingested into the pipeline through a Kafka topic.
Kafka is a distributed event streaming platform, providing a scalable and fault-tolerant mechanism for data ingestion.

2. Spark Streaming (Window())

Utilizing Spark Streaming for real-time data processing.
The Window() function is applied for windowed operations to aggregate data over specific time intervals.

3. Data Aggregated

Data is aggregated within the Spark Streaming step using various aggregation functions.
Common operations include summing, averaging, counting, etc., depending on the specific use case.

4. Cassandra

Aggregated data is stored in Cassandra, a highly scalable NoSQL database.
Cassandra is chosen for its ability to handle large volumes of data across multiple nodes.

5. BackEnd (WebSocket)

The backend of the application communicates with the front end through a WebSocket connection.
WebSocket enables bidirectional communication, allowing real-time updates to be sent from the server to the client.

6. Dashboard UI

The Dashboard UI provides a user interface for visualizing and interacting with real-time aggregated data.
Updates are received in real-time through the WebSocket connection, ensuring the dashboard reflects the latest information.

Getting Started

To set up and run the real-time data processing pipeline, follow the steps outlined in the Installation Guide and Configuration Documentation.

Usage

Prerequisites

Docker
Docker Compose

Installation

Clone the repository:

git clone https://github.com/anthoai97/simple-end-to-end-data-streaming
cd simple-end-to-end-data-streaming
docker-compose up

Owner

_{An Thoai}

License

This project is licensed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
backend		backend
docs		docs
frontend		frontend
jupyter		jupyter
kafka		kafka
spark		spark
.DS_Store		.DS_Store
.gitignore		.gitignore
Readme.md		Readme.md
custom_jaas.conf		custom_jaas.conf
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple end to end data streaming

Introduction

Architecture

Components

1. Kafka Topic

2. Spark Streaming (Window())

3. Data Aggregated

4. Cassandra

5. BackEnd (WebSocket)

6. Dashboard UI

Getting Started

Usage

Prerequisites

Installation

Owner

License

About

Releases

Packages

Languages

anthoai97/simple-end-to-end-data-streaming

Folders and files

Latest commit

History

Repository files navigation

Simple end to end data streaming

Introduction

Architecture

Components

1. Kafka Topic

2. Spark Streaming (Window())

3. Data Aggregated

4. Cassandra

5. BackEnd (WebSocket)

6. Dashboard UI

Getting Started

Usage

Prerequisites

Installation

Owner

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages