Skip to content

Databend

lyx edited this page Jan 17, 2025 · 1 revision

Databend is a next-generation cloud-native data warehouse developed in Rust and designed for cloud architecture. It leverages object storage to provide enterprises with a unified lakehouse architecture, offering a big data analytics platform with separated compute and storage.

This article will introduce how to ingest data from AutoMQ into Databend using bend-ingest-kafka.

Prerequisites

Prepare Databend Cloud and Test Data

First, go to Databend Cloud to activate a Warehouse, and create a database and a test table in the worksheet.


create database automq_db;
create table users (
    id bigint NOT NULL,
    name string NOT NULL,
    ts timestamp,
    status string
)

Prepare AutoMQ and Test Data

Refer to Deploy Locally▸ to deploy AutoMQ, ensuring network connectivity between AutoMQ and Databend.

Quickly create a Topic named example_topic in AutoMQ and write a test JSON data into it, following the steps below.

Create Topic

To create a topic using the Apache Kafka® command-line tool, ensure that you have access to a Kafka environment and that the Kafka service is running. Here is an example command to create a topic:


./kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 10.0.96.4:9092  --partitions 1 --replication-factor 1

When executing the command, replace topic and bootstrap-server with the actual Kafka server address you are using.

After creating the topic, you can use the following command to verify that the topic has been successfully created.


./kafka-topics.sh --describe example_topic --bootstrap-server 10.0.96.4:9092

Generate Test Data

Generate JSON formatted test data that corresponds with the table mentioned earlier.


{
  "id": 1,
  "name": "test user"
  "timestamp": "2023-11-10T12:00:00",
  "status": "active"
}

Write Test Data

Write test data into a Topic named example_topic using Kafka command line tools or programmatically. Below is an example using command line tools:


```sh
```markdown
echo '{"id": 1, "name": "Test User", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 10.0.96.4:9092 --topic example_topic

When executing the command, make sure to replace topic and bootstrap-server with the actual Kafka server address.

Use the following command to view the data just written to the topic:


sh kafka-console-consumer.sh --bootstrap-server 10.0.96.4:9092 --topic example_topic --from-beginning

Create bend-ingest-databend Job

bend-ingest-kafka can monitor Kafka and batch write data into Databend Table. After deploying bend-ingest-kafka, you can start the data import job.


bend-ingest-kafka --kafka-bootstrap-servers="localhost:9094" --kafka-topic="example_topic" --kafka-consumer-group="Consumer Group" --databend-dsn="https://cloudapp:password@host:443" --databend-table="automq_db.users" --data-format="json" --batch-size=5 --batch-max-interval=30s

When executing the command, make sure to replace kafka-bootstrap-servers with the actual Kafka server address.

Parameter Description

databend-dsn

Databend Cloud provides a DSN for connecting to the warehouse, which can be referenced in this documentation.

batch-size

bend-ingest-kafka accumulates data up to the batch size before triggering a data synchronization.

Validate Data Import

Navigate to the Databend Cloud worksheet and query the automq_db.users table. You will see that the data has been synchronized from AutoMQ to the Databend table.

AutoMQ Wiki Key Pages

What is automq

Getting started

Architecture

Deployment

Migration

Observability

Integrations

Releases

Benchmarks

Reference

Articles

Clone this wiki locally