Skip to content

Latest commit

 

History

History
113 lines (78 loc) · 5.01 KB

File metadata and controls

113 lines (78 loc) · 5.01 KB

Back-of-the-envelope Estimation

  • Overview
  • Must know
  • Example: Estimate Twitter QPS and storage requirements
  • Tips

Overview

Back-of the envelope estimation estimates you create using a combination of thought experiments and common performance numbers to get a good feel for which designs will meet your requirements.

Must know

  • Power of Two
  • Latency Numbers
  • Availability numbers

Power of Two

Although data volume can become enormous when dealing with distributed systems, calculation all boils down to the basics. To obtain correct calculations, it is critical to know the data volume unit using the power of two.

Latency Numbers

Those are latency numbers every programmer should know. Some numbers may be outdated as computers become faster, however, they should still be able to give us an idea of the fastness and slowness of different computer operations.

Operation Name Time
L1 Cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 100 ns
Main memory reference 100 ns
Compress 1K bytes with Zippy 10,000 ns = 10 us
Send 2K bytes over 1 Gbps network 20,000 ns = 20 us
Read 1 MB sequentially from memory 250,000 ns = 250 us
Round trip within the same datacenter 500,000 ns = 500 us
Disk seek 10,000,000 ns = 10ms
Read 1 MB sequentially from the network 10,000,000 ns = 10ms
Read 1 MB sequentially from disk 30,000,000 ns = 30 ms
Send packet CA (California) -> Netherlands -> CA 150,000,000 ns = 150ms
  • ns = nanosecond = 10^-9 seconds
  • us = microsecond = 10^-6 seconds = 1,000 ns
  • ms = microsecond = 10^-3 seconds = 1,000 us = 1,000,000 ns

By looking at this numbers, we get the following conclusions:

  • Memory is fast but the disk is slow.
  • Avoid disk seeks if possible.
  • Simple compression algorithms are fast.
  • Compress data before sending it over the internet if possible.
  • Data centers are usually in different regions, and it takes time to send data between them.

Availability Numbers

High availability is the ability of a system to be continuously operational for a desirably long periof of time. High availability is measured as a percentage, with 100% means a service that has 0 downtime. Uptime is traditionally measured in nines. The more nines, the better.

Most services fall between 99% and 100%

A Service Level Agreement (SLA) is a commonly used term for service providers. This agreement formally defines the level of uptime your service will decliver.

Cloud providers Amazon, Google, and Microsoft, set their SLAs at 99.9% or above.

Availability % Downtime per day Downtime per week Downtime per month Downtime per year
99% 14.40 minutes 1.68 hours 7.31 hours 3.65 days
99.9% 1.44 minutes 10.08 minutes 43.83 minutes 8.77 hours
99.99% 8.64 seconds 1.01 minutes 4.38 minutes 52.60 minutes
99.999% 864.00 milliseconds 6.05 seconds 26.30 seconds 5.26 minutes
99.9999% 86.40 milliseconds 604.80 seconds 2.63 seconds 31.56 seconds

Example: Estimate Twitter QPS and storage requirements

PLease note the following numbers are for this exercise only:

Assumptions

  • 300 million monthly active users.
  • 50% of users use Twitter daily.
  • Users post 2 tweets per day on average.
  • 10% of tweets contain media.
  • Data is stored for 5 years.

Estimations

Query per second (QPS) estimate

  • Daily active users (DAU) = 300 million * 50% = 150 million
  • Tweets QPS = 150 million DAU * 2 tweets / 24 hour / 3600 seconds = ~3500
  • Peak QPS = 2 * QPS = ~7000

Media storage estimate

  • Average tweet size:
    • tweet_id 64 bytes
    • text 140 bytes
    • media 1 MB
  • Media storage: 150 million DAU * 2 tweets * 10% * 1MB = 30TB per day
  • 5-year media storage: 30 TB * 365 * 5 = ~55PB

Tips

Solving the problem is more important than obtaining results:

  • Rounding and Approximation. There is no need to spend valuable time to solve complicated math problems. Precision is not expected. Use round numbers and approximation.

  • Write down your assumptions.

  • Label your units. You might get confused if you only write down "5". Does it mean 5 KB or 5 MB?

  • Commonly asked estimations: QPS, peak QPS, storage, cache, number of servers, etc.