Coinbase Market Data Pipeline

TL;DR

A data pipeline involves Apache Kafka, Apache Spark, and Amazon S3 to ingest, process and store market data from public Coinbase Market API, orchestrated by docker compose.

Coinbase Market Data API

Coinbase Market Data - Coinbase websocket to market data including level 2 orderbook data.

Quick Start

To run this project locally, you need to have installed docker, docker-compose and you also need an AWS account. Please set S3_ACCESS_KEY and S3_SECRET_KEY with appropriate value. And then run following command to start the service.

docker-compose up

Architecture

The pipeline is built using the following components, orchestrated by docker compose:

A python script which fetch data from Coinbase Market Data API using websocket protocol and publish data to Kafka cluster as producer
A Kafka cluster
A spark cluster and a job running on it to process data
Amazon S3 bucket to store processed data

Pipeline Architecture Diagram

+-------------------+    +--------------+    +------------------+    +---------+
|   Data Ingestor   |    | Apache Kafka |    | Apache Spark     |    | Amazon  |
|  (via WebSocket)  | -> | Cluster      | -> | Cluster/Job      | -> | S3      |
|                   |    |              |    | (Data Processor) |    | Bucket  |
+-------------------+    +--------------+    +------------------+    +---------+
          |                      |                    |                   |
          |                      |                    |                   |
          +----------------------+--------------------+-------------------+
                                 |
                        +-----------------+
                        | Docker Compose  |
                        +-----------------+

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
ingestor		ingestor
processor		processor
.gitignore		.gitignore
docker-compose.yml		docker-compose.yml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coinbase Market Data Pipeline

TL;DR

Coinbase Market Data API

Quick Start

Architecture

About

Releases

Packages

Languages

ti1uan/coinbase-mkt-data-pipeline

Folders and files

Latest commit

History

Repository files navigation

Coinbase Market Data Pipeline

TL;DR

Coinbase Market Data API

Quick Start

Architecture

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages