This project can setup a druid cluster on AWS and is forked from project [druid-ansible](https://github .com/godatadriven/druid-ansible)
- Terraform based cluster instances creation.
- Ansible playbook for Druid setup.
- Druid custom resources and extensions.
- Example Json scripts for data ingestion and query.
- Sample wikipedia data in Json & Parquet format.
- AWS region: provider.tf. Variable based.
- Variable definitions: vars_def. Two Implementations(stg, prod) based on environment.
- EC2 ami: ami. Centos 7 based.
- SSH key: key. Variable based. Create
druid.pem
key on AWS console and paste the key content in variable files. - IAM role: role. Create IAM role with S3 access policy.
- Network: network. Following below are details -
- Create AWS VPC. Provide CIDR block details in variable file.
- Create Subnet in VPC and 3 security groups. Change it as per requirements.
- Instances: hosts. Variable Based. Instances are tagged wit various params.
- Local and S3 based Terraform state backends.
- For Initialization for terraform, follow
cd terraform && terraform init -var-file=vars/<env-file>.tf
- For cluster creation, follow
cd terraform && terraform apply -var-file=vars/<env-file>.tf
- 1 Metadata DB Node. Postgres as metadata db along with Grafana based Druid cluster metrics.
- 3 Zookeeper Nodes. Apart from hosting as zookeeper processes, Druid data processes(historical) also runs on the machine.
- 2 Druid Master Nodes. Hosts Druid overlord and coordinator processes. 2 Nodes for HA.
- 2 Druid Query Nodes. Hosts Druid broker and middlemanager processes/ 2 Nodes for HA.
- 3 Druid Data Nodes. Hosts Druid historical process.
- After terraform apply, follow below command
cd .. && ./run.sh "--user centos playbook.yml"