Deploy AutoMQ on CubeFS

Preface

CubeFS [1] is a next-generation cloud-native storage product, currently an incubating open-source project hosted by CNCF. It is compatible with multiple access protocols such as S3, POSIX, and HDFS, and supports two storage engines: multi-replica and erasure coding. CubeFS offers features like multi-tenancy, multi-AZ deployment, and cross-region replication, making it widely applicable in scenarios such as big data, AI, container platforms, databases, middleware, storage-compute separation, data sharing, and data protection.

AutoMQ's innovative shared storage architecture requires low-cost object storage, and CubeFS supports S3-compatible interfaces. Its ObjectNode provides an S3-compatible object storage interface to operate files within CubeFS. Therefore, you can use open-source tools like S3Browser, S3Cmd, or the native Amazon S3 SDK to manage files in CubeFS. This makes CubeFS highly adaptable to AutoMQ. As a result, you can deploy an AutoMQ cluster to achieve a Kafka-compatible stream system that offers better cost efficiency, extreme elasticity, and single-digit millisecond latency.

This article will introduce how to deploy an AutoMQ cluster on CubeFS in your private data center.

Prerequisites

Prepare a CubeFS cluster

An available CubeFS environment. If you do not have a CubeFS environment yet, you can refer to the official documentation for dependency configuration [3] and setting up a basic CubeFS cluster [4].

The default installation package of CubeFS provides a series of command-line tools for managing the cluster in the build/bin directory. In this article, we will also use these command-line tools for some additional configurations.

Check the cluster status using CubeFS command-line tools to verify if the setup is successful:


# 执行命令
./build/bin/cfs-cli cluster info

# 结果输出
[Cluster]
  Cluster name       : cfs_dev
  Master leader      : 172.16.1.101:17010
  Master-1           : 172.16.1.101:17010
  Master-2           : 172.16.1.102:17010
  Master-3           : 172.16.1.103:17010
  Auto allocate      : Enabled
  MetaNode count (active/total)    : 4/4
  MetaNode used                    : 0 GB
  MetaNode available               : 21 GB
  MetaNode total                   : 21 GB
  DataNode count (active/total)    : 4/4
  DataNode used                    : 44 GB
  DataNode available               : 191 GB
  DataNode total                   : 235 GB
  Volume count       : 2
...

Note: The IP and port of the master node in the CubeFS cluster will be used in the subsequent Object Gateway configuration.

Enable Object Gateway

To enable CubeFS to support object storage protocols, you need to activate the Object Gateway. The role of the Object Gateway is to provide an S3-compatible object storage interface. This allows CubeFS to support both the traditional POSIX file system interface and an S3-compatible object storage interface. By doing so, CubeFS can leverage the advantages of these two common types of interfaces, providing users with a more flexible data storage and access solution. Specifically, once the Object Gateway is enabled, users can use the native Amazon S3 SDK to operate files stored in CubeFS, thus enjoying the convenience of object storage.

To start the Object Gateway, first create the objectnode.json configuration file in the CubeFS root directory. An example content of the objectnode.json configuration file is as follows:


{
     "role": "objectnode", 
     "listen": "17410",
     "domains": [
         "object.cfs.local"
     ],
     "logDir": "/cfs/Logs/objectnode",
     "logLevel": "info",
     "masterAddr": [
         "172.16.1.101:17010",
         "172.16.1.102:17010",
         "172.16.1.103:17010"
     ],
     "exporterPort": 9503,
     "prof": "7013"
}

Note: The IP and port information for masterAddr can be obtained from the CubeFS cluster information in the previous step.

Then use the following command to start the Object Gateway:


nohup ./build/bin/cfs-server -c objectnode.json &

Create CubeFS User

Create a CubeFS user and obtain the AccessKey and Secret AccessKey information.

You can refer to the User Management Documentation[6] for creating and querying the corresponding user information.

CubeFS supports multiple creation methods, such as using the AWS SDK 7 or HTTP request methods. Here, we will demonstrate creating via an HTTP request:

Specify the user ID, password, and type, and request the creation interface:


curl -H "Content-Type:application/json" -X POST --data '{"id":"automq","pwd":"12345","type":3}' "http://172.16.1.101:17010/user/create"

Query user information by user ID:


curl -v "http://10.196.59.198:17010/user/info?user=automq" | python -m json.tool

Response Example


{
     "user_id": "automq",
     "access_key": "UZONf5FF6WKwFCj4",
     "secret_key": "TRZzfPitQkxOLXqPhKMBRrDYUyXXMpWG",
     "policy": {
         "own_vols": ["vol1"],
         "authorized_vols": {
             "ltptest": [
                 "perm:builtin:ReadOnly",
                 "perm:custom:PutObjectAction"
             ]
         }
     },
     "user_type": 3,
     "create_time": "2024-06-06 09:25:04"
}

Creating a Bucket Using the S3 Interface

Use the AWS CLI tool to create the required bucket on CubeFS for the deployment of the AutoMQ cluster.

Obtain the user's key and other information, configure them using `aws configure`, and create the bucket using the AWS CLI tool.


aws s3api create-bucket --bucket automq-data --endpoint=http://127.16.1.101:17140
aws s3api create-bucket --bucket automq-ops --endpoint=http://127.16.1.101:17140

Use commands to view the existing buckets.


aws s3 ls --endpoint=http://172.16.1.101:17140

Preparing Machines for AutoMQ Deployment

Prepare 5 hosts for deploying the AutoMQ cluster. It is recommended to choose Linux amd64 hosts with 2 cores and 16GB of memory and prepare two virtual storage volumes. An example is as follows:

Role	IP	Node ID	System Volume	Data Volume
CONTROLLER	192.168.0.1	0	EBS 20GB	EBS 20GB
CONTROLLER	192.168.0.2	1	EBS 20GB	EBS 20GB
CONTROLLER	192.168.0.3	2	EBS 20GB	EBS 20GB
BROKER	192.168.0.4	3	EBS 20GB	EBS 20GB
BROKER	192.168.0.5	4	EBS 20GB	EBS 20GB

Tips:

Please ensure these machines are within the same subnet and can communicate with each other.

In non-production environments, it is acceptable to deploy only one Controller. By default, this Controller also functions as a Broker.

Download the latest official binary package for installation from AutoMQ Github Releases.

Install and start the AutoMQ cluster.

Configure the S3URL.

Step 1: Generate the S3 URL.

AutoMQ provides the `automq-kafka-admin.sh` tool for quickly starting AutoMQ. Simply provide an S3 URL containing the required S3 access point and authentication information to start AutoMQ with a single command, without manually generating a cluster ID or formatting storage.


### 命令行使用示例
bin/automq-kafka-admin.sh generate-s3-url \ 
--s3-access-key=xxx \ 
--s3-secret-key=yyy \ 
--s3-region=cn-northwest-1 \ 
--s3-endpoint=s3.cn-northwest-1.amazonaws.com.cn \ 
--s3-data-bucket=automq-data \ 
--s3-ops-bucket=automq-ops

If errors occur, please check the correctness and format of the parameters.

When using CubeFS, you can use the following configuration to generate a specific S3 URL.

Parameter Name	This example default values	Description
--s3-access-key	XXX	After creating a CubeFS user, remember to replace according to the actual situation
--s3-secret-key	YYY	After creating a CubeFS user, remember to replace according to the actual situation
--s3-region	auto	Here you can set the cluster name, or auto
--s3-endpoint	http://host ip:17140	This parameter is the S3 endpoint for CubeFS
--s3-data-bucket	automq-data	CubeFS's bucket name
--s3-ops-bucket	automq-ops	CubeFS's bucket name

Output result

After executing this command, the process will automatically proceed through the following stages:

Based on the provided accessKey and secretKey, it will probe the core features of S3 to verify the compatibility between AutoMQ and S3.
Generate the s3url based on identity information and access point information.
Obtain the startup command example for AutoMQ using the s3url. In the command, replace --controller-list and --broker-list with the actual CONTROLLER and BROKER that need to be deployed.

An example of the execution result is as follows:


############  Ping s3 ########################

[ OK ] Write s3 object
[ OK ] Read s3 object
[ OK ] Delete s3 object
[ OK ] Write s3 object
[ OK ] Upload s3 multipart object
[ OK ] Read s3 multipart object
[ OK ] Delete s3 object
############  String of s3url ################

Your s3url is:

s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=xxx&s3-secret-key=yyy&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA


############  Usage of s3url  ################
To start AutoMQ, generate the start commandline using s3url.
bin/automq-kafka-admin.sh generate-start-command \
--s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" \
--controller-list="192.168.0.1:9093;192.168.0.2:9093;192.168.0.3:9093"  \
--broker-list="192.168.0.4:9092;192.168.0.5:9092"

TIPS: Please replace the controller-list and broker-list with your actual IP addresses.

Step 2: Generate the list of startup commands.

Replace --controller-list and --broker-list in the command generated in the previous step with your host information. Specifically, replace them with the IP addresses of the 3 CONTROLLERs and 2 BROKERs mentioned in the environment setup, using the default ports 9092 and 9093.


bin/automq-kafka-admin.sh generate-start-command \
--s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" \
--controller-list="192.168.0.1:9093;192.168.0.2:9093;192.168.0.3:9093"  \
--broker-list="192.168.0.4:9092;192.168.0.5:9092"

Parameter Explanation

Parameter Name	Mandatory	Description
--s3-url	Yes	Generated by the command line tool bin/automq-kafka-admin.sh generate-s3-url, containing authentication, cluster ID, and other information
--controller-list	Yes	At least one address is needed, used as the IP and port list of the CONTROLLER host. The format is IP1:PORT1; IP2:PORT2; IP3:PORT3
--broker-list	Yes	At least one address is required, used as the IP and port list for the BROKER host. The format is IP1:PORT1;IP2:PORT2;IP3:PORT3
--controller-only-mode	No	Determines whether the CONTROLLER node exclusively assumes the CONTROLLER role. The default is false, meaning the deployed CONTROLLER node also functions as a BROKER.

Output result

After executing the command, a command for starting AutoMQ will be generated.


############  Start Commandline ##############
To start an AutoMQ Kafka server, please navigate to the directory where your AutoMQ tgz file is located and run the following command.

Before running the command, make sure that Java 17 is installed on your host. You can verify the Java version by executing 'java -version'.

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=0 --override [email protected]:9093,[email protected]:9093,[email protected]:9093 --override listeners=PLAINTEXT://192.168.0.1:9092,CONTROLLER://192.168.0.1:9093 --override advertised.listeners=PLAINTEXT://192.168.0.1:9092

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=1 --override [email protected]:9093,[email protected]:9093,[email protected]:9093 --override listeners=PLAINTEXT://192.168.0.2:9092,CONTROLLER://192.168.0.2:9093 --override advertised.listeners=PLAINTEXT://192.168.0.2:9092

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=2 --override [email protected]:9093,[email protected]:9093,[email protected]:9093 --override listeners=PLAINTEXT://192.168.0.3:9092,CONTROLLER://192.168.0.3:9093 --override advertised.listeners=PLAINTEXT://192.168.0.3:9092

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker --override node.id=3 --override [email protected]:9093,[email protected]:9093,[email protected]:9093 --override listeners=PLAINTEXT://192.168.0.4:9092 --override advertised.listeners=PLAINTEXT://192.168.0.4:9092

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker --override node.id=4 --override [email protected]:9093,[email protected]:9093,[email protected]:9093 --override listeners=PLAINTEXT://192.168.0.5:9092 --override advertised.listeners=PLAINTEXT://192.168.0.5:9092


TIPS: Start controllers first and then the brokers.

node.id defaults to auto-generate starting from 0.

Step 3: Start AutoMQ

To start the cluster, execute the command list generated in the previous step sequentially on the pre-specified CONTROLLER or BROKER hosts. For example, to start the first CONTROLLER process on 192.168.0.1, execute the first command template from the generated startup command list.


bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=0 --override [email protected]:9093,[email protected]:9093,[email protected]:9093 --override listeners=PLAINTEXT://192.168.0.1:9092,CONTROLLER://192.168.0.1:9093 --override advertised.listeners=PLAINTEXT://192.168.0.1:9092

Parameter Description

When using the startup command, unspecified parameters will adopt Apache Kafka's default configuration. For new parameters introduced by AutoMQ, AutoMQ's default values will be used. To override the default configuration, you can append additional --override key=value parameters at the end of the command.

Parameter Name	Mandatory	Description
s3-url	Yes	Generated by the command line tool bin/automq-kafka-admin.sh generate-s3-url, containing authentication, cluster ID, and other information
process.roles	Yes	The options are CONTROLLER or BROKER. If a host serves as both CONTROLLER and BROKER, the configuration value should be CONTROLLER,BROKER.
node.id	Yes	An integer used to uniquely identify a BROKER or CONTROLLER in a Kafka cluster, and it must remain unique within the cluster.
controller.quorum.voters	Yes	Information about the hosts participating in the KRaft election, including node ID, IP, and port information, for example: [email protected]:9093, [email protected]:9093, [email protected]:9093
listeners	Yes	The IP and port being listened to
advertised.listeners	Yes	BROKER provides the access address for the client.
log.dirs	No	Directory for storing KRaft and BROKER metadata.
s3.wal.path	No	In production environments, it is recommended to store AutoMQ WAL data on a newly mounted data volume as a raw device. This configuration provides better performance because AutoMQ supports writing data directly to raw devices, thereby reducing latency. Ensure the correct path is configured to store the WAL data.
autobalancer.controller.enable	No	The default value is false, which disables traffic self-balancing. When traffic self-balancing is enabled, the AutoMQ auto balancer component will automatically reassign partitions to ensure that the overall traffic is balanced.

Tips:

If you need to enable self-balancing or run [Example: Self-Balancing When Cluster Nodes Change], it is recommended to specify the parameter --override autobalancer.controller.enable=true for the Controller at startup.

To deploy AutoMQ in a private data center for production environments, ensure the durability of local SSDs. Since CubeFS does not support high-availability block storage protocols, it cannot directly manage disk redundancy or backup. However, you can address this with a RAID [8] solution.

Run in the background

If you need to run in the background mode, please add the following code at the end of the command:


command > /dev/null 2>&1 &

At this point, you have completed the deployment of an AutoMQ cluster based on CubeFS, having a low-cost, low-latency, second-level elastic Kafka cluster. If you want to further experience AutoMQ's second-level partition reassignment and continuous self-balancing features, you can refer to the official examples.

References

[1] CubeFS: https://www.cubefs.io/

[2] CubeFS's Multi-Level Caching: https://www.cubefs.io/docs/master/overview/introduction.html

[3] Dependency Configuration: [CubeFS | A Cloud Native Distributed Storage System]

[4] CubeFS Single Node Deployment: [www.cubefs.io]

[5] Object Gateway: https://www.cubefs.io/docs/master/design/objectnode.html

[6] CubeFS User Management Documentation: [CubeFS | A Cloud Native Distributed Storage System]

[7] CubeFS AWS SDK: https://www.cubefs.io/docs/master/user-guide/objectnode.html#%E6%94%AF%E6%8C%81%E7%9A%84sdk

[8] RAID: https://www.cnblogs.com/chuncn/p/6008173.html

AutoMQ Wiki Key Pages

What is automq

Getting started

Architecture

Deployment

Migration

Observability

Integrations

Data analysis
- RisingWave
- Databend
- Timeplus
- Apache Doris
- Flink
- StarRocks
Object storage
- MinIO
- Ceph
- CubeFS
Kafka ui
- Kafdrop
- Redpanda Console
Observability
- Flashcat
- Guance Cloud
Data integration
- CloudCanal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploy AutoMQ on CubeFS

Preface

Prerequisites

Prepare a CubeFS cluster

Enable Object Gateway

Create CubeFS User

Creating a Bucket Using the S3 Interface

Preparing Machines for AutoMQ Deployment

Install and start the AutoMQ cluster.

Configure the S3URL.

Step 1: Generate the S3 URL.

Output result

Step 2: Generate the list of startup commands.

Parameter Explanation

Output result

Step 3: Start AutoMQ

Parameter Description

Run in the background

References

AutoMQ Wiki Key Pages

What is automq

Getting started

Architecture

Deployment

Migration

Observability

Integrations

Releases

Benchmarks

Reference

Articles

Clone this wiki locally