Skip to content

Performance

Ioannis Papapanagiotou edited this page Sep 14, 2015 · 30 revisions

In this part, we are going to show some of the performance tests with Dynomite. This section is divided

Linear Scale Test

One of our top priorities of Dynomite is to be able to scale a data store linearly with growing traffic demands. We have a symmetric deployment model at Netflix where Dynomite is deployed in every AWS zone and sized equally, as in same number of nodes in every zone. We want the capability to simultaneously scale dynomite in every zone linearly as traffic grows in that region.

We conducted a simple load test to prove this ability of Dynomite.

Client (workload generator) Cluster

  • Number of nodes: 24
  • Region: us_east_1
  • EC2 instance type: m3.2xlarge (30GB RAM, 8 CPU cores, Network throughput: high)
  • Data size: 1024 Bytes
  • Number of readers/writes: 50/50
  • Read/Write ratio: 80/20 (the OPS was variable per test, but the ratio was kept 80/20)
  • Demo application used a simple workload of just key value pairs for read and writes i.e the Redis GET and SET api.
  • Number of Keys: 1M

Dynomite Cluster

  • Data store: Redis 2.8.9
  • Number of nodes: 3-48 (doubling every time)
  • Region: us_east_1
  • EC2 instance type: r3.2xlarge (61GB RAM, 8 CPU cores, Network throughput: high)
  • Consistency: disabled (quorum_one)
  • % of RAM used by data store: 95%

Dynomite speed under different workloads

The throughput graphs above indicate that Dynomite can scale horizontally in terms of throughput. Therefore, it can handle even more traffic by increasing the number of nodes in a region.

The throughput graphs indicate that on a per node basis with r3.2xlarge nodes Dynomite can process 33K reads OPS and around 10K write OPS from the client. This is because the traffic coming to each node is replicated in us_east_1 in 2 additional racks, therefore dribbling the effective throughput. Note that the actual provisioning formula is even more complicated as it is also affected by the design of the client application. In any case, for 1KB payload, the main bottleneck is the network. Therefore, switching to 10Gbps, Dynomite can provide even faster throughput.

Dynomite average latency under different workloads

The average and median latency values show that Dynomite can provide sub-millisecond latency to the client. More specifically, Dynomite does not add any extra latency as it scales to higher number of nodes, and therefore throughput. Overall, the Dynomite node adds around 20% of the average latency. Note that ny Dynomite node we refer to the node that contains both Dynomite and Redis. Hence for the above results we do not distinguish on whether Dynomite layer or Redis contributes to the latency.

Dynomite 99th percentile latency under different workloads

At the 99th percentile Dynomite's latency is 0.4ms and does not increase as we scale the cluster up/down. More specifically, the network is the major reason for the 99th percentile latency, as Dynomite's node effect is <2%.

Other Performance Tests

Details

  • Server instance type was r3.xlarge. These instances are well suited for Dynomite workloads - i.e good balance between memory and network
  • Replication factor was set to 3. We did this by deploying Dynomite in 3 zones in us-east-1 and each zone had the same no of servers and hence the same no of tokens.
  • Client fleet used instance type as m2.2xls which is also typical for an application here at Netflix.
  • Demo application used a simple workload of just key value pairs for read and writes i.e the Redis GET and SET api.
  • Payload size was chosen to be 1K
  • We maintained an 80% - 20% ratio between reads and writes which is also typical of many use cases here at Netflix.

Stage 1 - 6 node cluster

We setup a Dynomite cluster of 6 nodes (i.e 2 per zone). We observed throughput of about 80k per second across the client fleet:

while keeping the avg latency ~1 ms:

and 99 percentile latency in the single digit ms range:

Stage 2 - 12 node cluster

We then doubled the size of the Dynomite cluster from 6 nodes to 12 nodes. We also simultaneously doubled our client fleet to add more load on the cluster.

As expected client fleet throughput went up by 100%

while still keeping avg and 99 percentile latencies in check

Stage 3 - 24 node cluster

We then went for one more double for the server and client fleet and throughput went up by 100% once again, while latencies remained the same.

Replication Delay

In our initial tests, we measured the time it took for a key/value pair to become available on another region replica by writing 1k key/value pairs to Dynomite in one region, then polling the other region randomly for 20 keys. The value for each key in this case is just timestamp when the write action started. The client in the other region then reads back those timestamps and compute the durations. We repeated this same experiment several times and took the average. From this we could derive a rough idea of the speed of the replication.

We expect this latency to remain more or less constant as we add code path optimization as well as enhancements in the replication strategy itself (optimizations will improve speed, features will potentially add latency).

Result:

For 5 iterations of this experiment, the average duration for replications was around 85ms. (note that the duration numbers are measured at the client layers so the real numbers should be smaller).