Skip to content

ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau

License

Notifications You must be signed in to change notification settings

prakashdontaraju/google-cloud-ecommerce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

2 Data Pipelines To Easily Know Customer Purchasing Behaviors

Successful implementation of Steaming Pipeline and Batch Pipeline on Google Cloud

Business Case

Our customer (company) from the ecommerce space decided to move its data processing, storage and analytics workloads to the Google Cloud Platform as part of their goal to provide their customers (end user) a better experience.

Results

I successfully engineered streaming & batch data processing pipelines on the Google Cloud Platform.

I created the data pipeline infrastructure on Google Cloud for analyzing customer purchasing behavior in real-time and perfomed the analysis.

Deployment

I plan to write a blog post about how to deploy these 2 pipelines on Google Cloud soon. Stay tuned!

Data

I chose the eCommerce behavior data from multi category store available on Kaggle to focus on successfully implementing streaming and batch pipelines.

I pre-process (transform) data but real business data requires significantly more pre-processing as it's quality may not be ideal for the business problem(s) at hand.

Properties of data

Data file contains customer behavior data on a large multi-category online store's website for 1 month (November 2019).

Each row in the file represents an event.

  • All events are related to products and users

  • There are 3 different types of events → view, cart and purchase

The 2 purchase funnels are

  • view → cart → purchase
  • view → purchase

Streaming & Batch Pipelines on Google Cloud

Implementation

Streaming & Batch Pipelines on Google Cloud

Storage

BigQuery (Storing streaming data)

Streaming Data in BigQuery

Cloud Spanner (Storing data in batches)

Batch Data in Cloud Spanner

Analysis

  • Daily event count

Daily Event Count

  • Most visited sub-categories

Most Visited Sub-Categories

  • Hour vs Event Type vs Price

Hour vs Event Type vs Price

  • Purchase conversion volume

Purchase Conversion Volume

  • Purchase conversion rate

Purchase Conversion Rate

Connect With Me

Prakash Dontaraju LinkedIn Twitter Medium

About

ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages