AutoMQ Empowers JD to Build a Cloud Native Modern Data Stack

Guide:

JD.com (hereinafter referred to as "JD"), a renowned global e-commerce platform listed on the NASDAQ and Hong Kong Stock Exchange, provides high-quality e-commerce services for hundreds of millions of users.

JD originally built Apache Kafka on its self-developed CubeFS distributed storage system. Apache Kafka ensured data persistence through a triple-replica ISR, which, combined with CubeFS's inbuilt triple replica, resulted in a total of 9 copies of the data, with 6 being redundant, leading to a significant waste of storage costs. AutoMQ secures data persistence by offloading it to cloud storage or distributed storage, helping JD save 2/3 of its storage costs .

In addition, JD is dedicated to containerizing its data infrastructure. Apache Kafka's integrated storage-computation architecture and its ISR mechanism, based on replicated data, makes it challenging to achieve elasticity in a container environment. AutoMQ, with its fully decoupled storage and computation, along with its second-level partition migration and inbuilt automatic traffic balancing components, allows users to quickly achieve elasticity for AutoMQ in container and k8s environments. AutoMQ's rapid elasticity is perfectly suited to JD's containerization needs, making it a key factor in their choice.

Today's case study, based on the insights shared by JD's Kafka Cloud Native Architect, Zhong Hou, at an AutoMQ Meetup, summarizes the key points to help everyone understand this customer case. The full video can be viewed at the end of the article.

JD’s Kafka cloud-native architect Zhong Hou primarily introduced the JDQ platform, JDQ cloud-native solutions and implementations, and AutoMQ practices based on ChubaoFS, from the perspectives of JD’s Kafka cloud-native deployment and AutoMQ practices.

JDQ Cloud-Native Solution

The JDQ platform employs Kubernetes stateful service orchestration, managing the entire cluster through the StatefulSet controller. It supports various storage solutions and service access methods and can be deployed on Private Cloud, Public Cloud, and JD’s internal Kubernetes platform. In the container environment, modifications to the Kafka engine include optimizing the read/write process for storage and computing separation, kernel modifications to enhance performance, implementing rate limiting and degradation mechanisms at the business layer, and pause/resume consumption features based on the Kafka protocol.

JD’s Research on AutoMQ

JD conducted comprehensive research on AutoMQ’s S3 cloud-native architecture, single replica, and second-level partition reassignment, discovering a series of powerful features and advantages. AutoMQ’s underlying architecture employs storage and computing separation, optimizing read/write operations through a self-developed S3 stream and writing data to object storage, thus ensuring maximum compatibility with the Kafka protocol. AutoMQ’s control plane, managed by the Controller, controls the entire system’s water level and can initiate partition self-balancing, ensuring balanced traffic load across nodes.

When faced with challenges in the K8S environment, the AutoMQ architecture manages remote data and utilizes the Controller monitoring mechanism to quickly elect a new leader for a single replica and successfully take over remote data, ensuring data read/write availability. Additionally, due to the remote storage characteristics, AutoMQ supports second-level partition reassignment without needing underlying data migration, only requiring metadata updates, significantly reducing data reassignment costs. Furthermore, by monitoring metrics and using automated reassignment mechanisms, it enhances data self-balancing efficiency. Lastly, the AutoMQ architecture also has elastic scalability capabilities, allowing it to quickly respond to fluctuations in business traffic. By optimizing reassignment costs, it effectively controls the costs of elastic scaling, demonstrating strong elastic capabilities.

AutoMQ Practices Based on ChubaoFS

JD tried four solutions to implement the AutoMQ architecture (Solution 1: Local writing and remote storage; Solution 2: Remove load balancing; Solution 3: Local storage separation; Solution 4: Migrate to K8S) to progressively optimize and upgrade the architecture, solving challenges such as network node bandwidth occupation, performance bottlenecks, and storage costs. Solution four migrates service processes to K8S, leveraging its resource hosting capabilities to improve elastic scalability. In practice, the advantage of the AutoMQ customer architecture lies in reducing bandwidth and storage costs, eliminating data replica copying, thus improving overall efficiency.

AutoMQ Wiki Key Pages

What is automq

Getting started

Architecture

Deployment

Migration

Observability

Integrations

Data analysis
- RisingWave
- Databend
- Timeplus
- Apache Doris
- Flink
- StarRocks
Object storage
- MinIO
- Ceph
- CubeFS
Kafka ui
- Kafdrop
- Redpanda Console
Observability
- Flashcat
- Guance Cloud
Data integration
- CloudCanal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly