-
Notifications
You must be signed in to change notification settings - Fork 35
Install a Hub Cluster
A standard hub installation requires:
- A load balancer. We use HAProxy
- A ZooKeeper Cluster
- At least 2 servers with local disk for the Hub and Spoke
- Credentials for access to S3 and DynamoDB in AWS
At FlightStats, we have run hub clusters with AWS instances ranging from t2.small (1 core, 2GB RAM, 10 GB disk) to m4.xlarge (4 cores, 16GB RAM, 70 GB disk), depending on the requirements.
The minimum requirement for a fault tolerant cluster is two servers, which provides a single redundancy.
FlightStats typically uses 3 instances, so we always have a backup during rolling restarts. This also means that during a failure scenario, the load is still divided between two instances.
A hub cluster could be larger than 3 instances. The hub would need to be modified for Spoke performance to scale linearly with larger clusters.
To get the best performance out of Spoke, instance memory should be sized so that all of Spoke's data will fit within the Operating System's File System cache.
Spoke should have dedicated disk to provide consistent performance. The disk needs to be sized for the Spoke cache time, which defaults to 60 minutes. Spoke compresses payloads by default.
For correct behavior in the Sequential Write Use Case, hub servers need system time to be precise within the cluster (not necessarily accurate) https://github.com/flightstats/hub/wiki/Hub-NTP-Settings
Example config files for the hub, logback and start script
The hub requires a S3 bucket to be created ahead of time. The S3 bucket name is {app.name}-{app.environment}-{s3.environment} from hub.properties
If your cluster is running in EC2, the hub will automatically use the IAM roles to access S3 and Dynamo.
Otherwise, specify the location of your credentials with the property aws.credentials
dynamo-prefix
is app.name
-app.environment
The following are the permissions you need to set:
"Statement": [
{
"Sid": "Stmt1391722720000",
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource": [
"arn:aws:s3:::<s3-bucket>/*"
]
},
{
"Sid": "Stmt1391722763000",
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource": [
"arn:aws:s3:::<s3-bucket>"
]
},
{
"Sid": "Stmt1391722978000",
"Effect": "Allow",
"Action": [
"dynamodb:*"
],
"Resource": [
"arn:aws:dynamodb:us-west-2:<account-id>:table/<dynamo-prefix>-*"
]
}
]
The hub is designed to use ZooKeeper lightly, and therefore these servers can be smaller than the hub servers. Our hub clusters share the ZooKeeper cluster within the same environment, for example, we run two Hub clusters in US-East production, and they both utilize a single ZooKeeper cluster. The three ZooKeeper instances are m3.mediums. In non-production environments we use t2.smalls and t2.micros.