-
Notifications
You must be signed in to change notification settings - Fork 231
Deploy AutoMQ on CubeFS
CubeFS [1] is a next-generation cloud-native storage product, currently an incubating open-source project hosted by CNCF. It is compatible with multiple access protocols such as S3, POSIX, and HDFS, and supports two storage engines: multi-replica and erasure coding. CubeFS offers features like multi-tenancy, multi-AZ deployment, and cross-region replication, making it widely applicable in scenarios such as big data, AI, container platforms, databases, middleware, storage-compute separation, data sharing, and data protection.
AutoMQ's innovative shared storage architecture requires low-cost object storage, and CubeFS supports S3-compatible interfaces. Its ObjectNode provides an S3-compatible object storage interface to operate files within CubeFS. Therefore, you can use open-source tools like S3Browser, S3Cmd, or the native Amazon S3 SDK to manage files in CubeFS. This makes CubeFS highly adaptable to AutoMQ. As a result, you can deploy an AutoMQ cluster to achieve a Kafka-compatible stream system that offers better cost efficiency, extreme elasticity, and single-digit millisecond latency.
This article will introduce how to deploy an AutoMQ cluster on CubeFS in your private data center.
- An available CubeFS environment. If you do not have a CubeFS environment yet, you can refer to the official documentation for dependency configuration [3] and setting up a basic CubeFS cluster [4].
The default installation package of CubeFS provides a series of command-line tools for managing the cluster in the build/bin directory. In this article, we will also use these command-line tools for some additional configurations.
Check the cluster status using CubeFS command-line tools to verify if the setup is successful:
# 执行命令
./build/bin/cfs-cli cluster info
# 结果输出
[Cluster]
Cluster name : cfs_dev
Master leader : 172.16.1.101:17010
Master-1 : 172.16.1.101:17010
Master-2 : 172.16.1.102:17010
Master-3 : 172.16.1.103:17010
Auto allocate : Enabled
MetaNode count (active/total) : 4/4
MetaNode used : 0 GB
MetaNode available : 21 GB
MetaNode total : 21 GB
DataNode count (active/total) : 4/4
DataNode used : 44 GB
DataNode available : 191 GB
DataNode total : 235 GB
Volume count : 2
...
Note: The IP and port of the master node in the CubeFS cluster will be used in the subsequent Object Gateway configuration.
To enable CubeFS to support object storage protocols, you need to activate the Object Gateway. The role of the Object Gateway is to provide an S3-compatible object storage interface. This allows CubeFS to support both the traditional POSIX file system interface and an S3-compatible object storage interface. By doing so, CubeFS can leverage the advantages of these two common types of interfaces, providing users with a more flexible data storage and access solution. Specifically, once the Object Gateway is enabled, users can use the native Amazon S3 SDK to operate files stored in CubeFS, thus enjoying the convenience of object storage.
To start the Object Gateway, first create the objectnode.json configuration file in the CubeFS root directory. An example content of the objectnode.json configuration file is as follows:
{
"role": "objectnode",
"listen": "17410",
"domains": [
"object.cfs.local"
],
"logDir": "/cfs/Logs/objectnode",
"logLevel": "info",
"masterAddr": [
"172.16.1.101:17010",
"172.16.1.102:17010",
"172.16.1.103:17010"
],
"exporterPort": 9503,
"prof": "7013"
}
Note: The IP and port information for masterAddr can be obtained from the CubeFS cluster information in the previous step.
Then use the following command to start the Object Gateway:
nohup ./build/bin/cfs-server -c objectnode.json &
- Create a CubeFS user and obtain the AccessKey and Secret AccessKey information.
You can refer to the User Management Documentation[6] for creating and querying the corresponding user information.
CubeFS supports multiple creation methods, such as using the AWS SDK 7 or HTTP request methods. Here, we will demonstrate creating via an HTTP request:
- Specify the user ID, password, and type, and request the creation interface:
curl -H "Content-Type:application/json" -X POST --data '{"id":"automq","pwd":"12345","type":3}' "http://172.16.1.101:17010/user/create"
- Query user information by user ID:
curl -v "http://10.196.59.198:17010/user/info?user=automq" | python -m json.tool
- Response Example
{
"user_id": "automq",
"access_key": "UZONf5FF6WKwFCj4",
"secret_key": "TRZzfPitQkxOLXqPhKMBRrDYUyXXMpWG",
"policy": {
"own_vols": ["vol1"],
"authorized_vols": {
"ltptest": [
"perm:builtin:ReadOnly",
"perm:custom:PutObjectAction"
]
}
},
"user_type": 3,
"create_time": "2024-06-06 09:25:04"
}
Use the AWS CLI tool to create the required bucket on CubeFS for the deployment of the AutoMQ cluster.
Obtain the user's key and other information, configure them using `aws configure`, and create the bucket using the AWS CLI tool.
aws s3api create-bucket --bucket automq-data --endpoint=http://127.16.1.101:17140
aws s3api create-bucket --bucket automq-ops --endpoint=http://127.16.1.101:17140
Use commands to view the existing buckets.
aws s3 ls --endpoint=http://172.16.1.101:17140
Prepare 5 hosts for deploying the AutoMQ cluster. It is recommended to choose Linux amd64 hosts with 2 cores and 16GB of memory and prepare two virtual storage volumes. An example is as follows:
Role |
IP |
Node ID |
System Volume |
Data Volume |
---|---|---|---|---|
CONTROLLER |
192.168.0.1 |
0 |
EBS 20GB |
EBS 20GB |
CONTROLLER |
192.168.0.2 |
1 |
EBS 20GB |
EBS 20GB |
CONTROLLER |
192.168.0.3 |
2 |
EBS 20GB |
EBS 20GB |
BROKER |
192.168.0.4 |
3 |
EBS 20GB |
EBS 20GB |
BROKER |
192.168.0.5 |
4 |
EBS 20GB |
EBS 20GB |
Tips:
Please ensure these machines are within the same subnet and can communicate with each other.
In non-production environments, it is acceptable to deploy only one Controller. By default, this Controller also functions as a Broker.
- Download the latest official binary package for installation from AutoMQ Github Releases.
AutoMQ provides the `automq-kafka-admin.sh` tool for quickly starting AutoMQ. Simply provide an S3 URL containing the required S3 access point and authentication information to start AutoMQ with a single command, without manually generating a cluster ID or formatting storage.
### 命令行使用示例
bin/automq-kafka-admin.sh generate-s3-url \
--s3-access-key=xxx \
--s3-secret-key=yyy \
--s3-region=cn-northwest-1 \
--s3-endpoint=s3.cn-northwest-1.amazonaws.com.cn \
--s3-data-bucket=automq-data \
--s3-ops-bucket=automq-ops
If errors occur, please check the correctness and format of the parameters.
When using CubeFS, you can use the following configuration to generate a specific S3 URL.
Parameter Name |
This example default values |
Description |
---|---|---|
--s3-access-key |
XXX |
After creating a CubeFS user, remember to replace according to the actual situation |
--s3-secret-key |
YYY |
After creating a CubeFS user, remember to replace according to the actual situation |
--s3-region |
auto |
Here you can set the cluster name, or auto |
--s3-endpoint |
http://host ip:17140 |
This parameter is the S3 endpoint for CubeFS |
--s3-data-bucket |
automq-data |
CubeFS's bucket name |
--s3-ops-bucket |
automq-ops |
CubeFS's bucket name |
After executing this command, the process will automatically proceed through the following stages:
-
Based on the provided accessKey and secretKey, it will probe the core features of S3 to verify the compatibility between AutoMQ and S3.
-
Generate the s3url based on identity information and access point information.
-
Obtain the startup command example for AutoMQ using the s3url. In the command, replace --controller-list and --broker-list with the actual CONTROLLER and BROKER that need to be deployed.
An example of the execution result is as follows:
############ Ping s3 ########################
[ OK ] Write s3 object
[ OK ] Read s3 object
[ OK ] Delete s3 object
[ OK ] Write s3 object
[ OK ] Upload s3 multipart object
[ OK ] Read s3 multipart object
[ OK ] Delete s3 object
############ String of s3url ################
Your s3url is:
s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=xxx&s3-secret-key=yyy&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA
############ Usage of s3url ################
To start AutoMQ, generate the start commandline using s3url.
bin/automq-kafka-admin.sh generate-start-command \
--s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" \
--controller-list="192.168.0.1:9093;192.168.0.2:9093;192.168.0.3:9093" \
--broker-list="192.168.0.4:9092;192.168.0.5:9092"
TIPS: Please replace the controller-list and broker-list with your actual IP addresses.
Replace --controller-list and --broker-list in the command generated in the previous step with your host information. Specifically, replace them with the IP addresses of the 3 CONTROLLERs and 2 BROKERs mentioned in the environment setup, using the default ports 9092 and 9093.
bin/automq-kafka-admin.sh generate-start-command \
--s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" \
--controller-list="192.168.0.1:9093;192.168.0.2:9093;192.168.0.3:9093" \
--broker-list="192.168.0.4:9092;192.168.0.5:9092"
Parameter Name |
Mandatory |
Description |
---|---|---|
--s3-url |
Yes |
Generated by the command line tool bin/automq-kafka-admin.sh generate-s3-url, containing authentication, cluster ID, and other information |
--controller-list |
Yes |
At least one address is needed, used as the IP and port list of the CONTROLLER host. The format is IP1:PORT1; IP2:PORT2; IP3:PORT3 |
--broker-list |
Yes |
At least one address is required, used as the IP and port list for the BROKER host. The format is IP1:PORT1;IP2:PORT2;IP3:PORT3 |
--controller-only-mode |
No |
Determines whether the CONTROLLER node exclusively assumes the CONTROLLER role. The default is false, meaning the deployed CONTROLLER node also functions as a BROKER. |
After executing the command, a command for starting AutoMQ will be generated.
############ Start Commandline ##############
To start an AutoMQ Kafka server, please navigate to the directory where your AutoMQ tgz file is located and run the following command.
Before running the command, make sure that Java 17 is installed on your host. You can verify the Java version by executing 'java -version'.
bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=0 --override [email protected]:9093,[email protected]:9093,[email protected]:9093 --override listeners=PLAINTEXT://192.168.0.1:9092,CONTROLLER://192.168.0.1:9093 --override advertised.listeners=PLAINTEXT://192.168.0.1:9092
bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=1 --override [email protected]:9093,[email protected]:9093,[email protected]:9093 --override listeners=PLAINTEXT://192.168.0.2:9092,CONTROLLER://192.168.0.2:9093 --override advertised.listeners=PLAINTEXT://192.168.0.2:9092
bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=2 --override [email protected]:9093,[email protected]:9093,[email protected]:9093 --override listeners=PLAINTEXT://192.168.0.3:9092,CONTROLLER://192.168.0.3:9093 --override advertised.listeners=PLAINTEXT://192.168.0.3:9092
bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker --override node.id=3 --override [email protected]:9093,[email protected]:9093,[email protected]:9093 --override listeners=PLAINTEXT://192.168.0.4:9092 --override advertised.listeners=PLAINTEXT://192.168.0.4:9092
bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker --override node.id=4 --override [email protected]:9093,[email protected]:9093,[email protected]:9093 --override listeners=PLAINTEXT://192.168.0.5:9092 --override advertised.listeners=PLAINTEXT://192.168.0.5:9092
TIPS: Start controllers first and then the brokers.
node.id defaults to auto-generate starting from 0.
To start the cluster, execute the command list generated in the previous step sequentially on the pre-specified CONTROLLER or BROKER hosts. For example, to start the first CONTROLLER process on 192.168.0.1, execute the first command template from the generated startup command list.
bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=0 --override [email protected]:9093,[email protected]:9093,[email protected]:9093 --override listeners=PLAINTEXT://192.168.0.1:9092,CONTROLLER://192.168.0.1:9093 --override advertised.listeners=PLAINTEXT://192.168.0.1:9092
When using the startup command, unspecified parameters will adopt Apache Kafka's default configuration. For new parameters introduced by AutoMQ, AutoMQ's default values will be used. To override the default configuration, you can append additional --override key=value parameters at the end of the command.
Parameter Name |
Mandatory |
Description |
---|---|---|
s3-url |
Yes |
Generated by the command line tool bin/automq-kafka-admin.sh generate-s3-url, containing authentication, cluster ID, and other information |
process.roles |
Yes |
The options are CONTROLLER or BROKER. If a host serves as both CONTROLLER and BROKER, the configuration value should be CONTROLLER,BROKER. |
node.id |
Yes |
An integer used to uniquely identify a BROKER or CONTROLLER in a Kafka cluster, and it must remain unique within the cluster. |
controller.quorum.voters |
Yes |
Information about the hosts participating in the KRaft election, including node ID, IP, and port information, for example: [email protected]:9093, [email protected]:9093, [email protected]:9093 |
listeners |
Yes |
The IP and port being listened to |
advertised.listeners |
Yes |
BROKER provides the access address for the client. |
log.dirs |
No |
Directory for storing KRaft and BROKER metadata. |
s3.wal.path |
No |
In production environments, it is recommended to store AutoMQ WAL data on a newly mounted data volume as a raw device. This configuration provides better performance because AutoMQ supports writing data directly to raw devices, thereby reducing latency. Ensure the correct path is configured to store the WAL data. |
autobalancer.controller.enable |
No |
The default value is false, which disables traffic self-balancing. When traffic self-balancing is enabled, the AutoMQ auto balancer component will automatically reassign partitions to ensure that the overall traffic is balanced. |
Tips:
If you need to enable self-balancing or run [Example: Self-Balancing When Cluster Nodes Change], it is recommended to specify the parameter --override autobalancer.controller.enable=true for the Controller at startup.
To deploy AutoMQ in a private data center for production environments, ensure the durability of local SSDs. Since CubeFS does not support high-availability block storage protocols, it cannot directly manage disk redundancy or backup. However, you can address this with a RAID [8] solution.
If you need to run in the background mode, please add the following code at the end of the command:
command > /dev/null 2>&1 &
At this point, you have completed the deployment of an AutoMQ cluster based on CubeFS, having a low-cost, low-latency, second-level elastic Kafka cluster. If you want to further experience AutoMQ's second-level partition reassignment and continuous self-balancing features, you can refer to the official examples.
[1] CubeFS: https://www.cubefs.io/
[2] CubeFS's Multi-Level Caching: https://www.cubefs.io/docs/master/overview/introduction.html
[3] Dependency Configuration: [CubeFS | A Cloud Native Distributed Storage System]
[4] CubeFS Single Node Deployment: [www.cubefs.io]
[5] Object Gateway: https://www.cubefs.io/docs/master/design/objectnode.html
[6] CubeFS User Management Documentation: [CubeFS | A Cloud Native Distributed Storage System]
[7] CubeFS AWS SDK: https://www.cubefs.io/docs/master/user-guide/objectnode.html#%E6%94%AF%E6%8C%81%E7%9A%84sdk
- What is automq: Overview
- Difference with Apache Kafka
- Difference with WarpStream
- Difference with Tiered Storage
- Compatibility with Apache Kafka
- Licensing
- Deploy Locally
- Cluster Deployment on Linux
- Cluster Deployment on Kubernetes
- Example: Produce & Consume Message
- Example: Simple Benchmark
- Example: Partition Reassignment in Seconds
- Example: Self Balancing when Cluster Nodes Change
- Example: Continuous Data Self Balancing
-
S3stream shared streaming storage
-
Technical advantage
- Deployment: Overview
- Runs on Cloud
- Runs on CEPH
- Runs on CubeFS
- Runs on MinIO
- Runs on HDFS
- Configuration
-
Data analysis
-
Object storage
-
Kafka ui
-
Observability
-
Data integration