Back-of-the-envelope Estimation

Overview
Must know
Example: Estimate Twitter QPS and storage requirements
Tips

Overview

Back-of the envelope estimation estimates you create using a combination of thought experiments and common performance numbers to get a good feel for which designs will meet your requirements.

Must know

Power of Two
Latency Numbers
Availability numbers

Power of Two

Although data volume can become enormous when dealing with distributed systems, calculation all boils down to the basics. To obtain correct calculations, it is critical to know the data volume unit using the power of two.

Latency Numbers

Those are latency numbers every programmer should know. Some numbers may be outdated as computers become faster, however, they should still be able to give us an idea of the fastness and slowness of different computer operations.

Operation Name	Time
L1 Cache reference	0.5 ns
Branch mispredict	5 ns
L2 cache reference	7 ns
Mutex lock/unlock	100 ns
Main memory reference	100 ns
Compress 1K bytes with Zippy	10,000 ns = 10 us
Send 2K bytes over 1 Gbps network	20,000 ns = 20 us
Read 1 MB sequentially from memory	250,000 ns = 250 us
Round trip within the same datacenter	500,000 ns = 500 us
Disk seek	10,000,000 ns = 10ms
Read 1 MB sequentially from the network	10,000,000 ns = 10ms
Read 1 MB sequentially from disk	30,000,000 ns = 30 ms
Send packet CA (California) -> Netherlands -> CA	150,000,000 ns = 150ms

ns = nanosecond = 10^-9 seconds

us = microsecond = 10^-6 seconds = 1,000 ns

ms = microsecond = 10^-3 seconds = 1,000 us = 1,000,000 ns

By looking at this numbers, we get the following conclusions:

Memory is fast but the disk is slow.
Avoid disk seeks if possible.
Simple compression algorithms are fast.
Compress data before sending it over the internet if possible.
Data centers are usually in different regions, and it takes time to send data between them.

Availability Numbers

High availability is the ability of a system to be continuously operational for a desirably long periof of time. High availability is measured as a percentage, with 100% means a service that has 0 downtime. Uptime is traditionally measured in nines. The more nines, the better.

Most services fall between 99% and 100%

A Service Level Agreement (SLA) is a commonly used term for service providers. This agreement formally defines the level of uptime your service will decliver.

Cloud providers Amazon, Google, and Microsoft, set their SLAs at 99.9% or above.

Availability %	Downtime per day	Downtime per week	Downtime per month	Downtime per year
99%	14.40 minutes	1.68 hours	7.31 hours	3.65 days
99.9%	1.44 minutes	10.08 minutes	43.83 minutes	8.77 hours
99.99%	8.64 seconds	1.01 minutes	4.38 minutes	52.60 minutes
99.999%	864.00 milliseconds	6.05 seconds	26.30 seconds	5.26 minutes
99.9999%	86.40 milliseconds	604.80 seconds	2.63 seconds	31.56 seconds

Example: Estimate Twitter QPS and storage requirements

PLease note the following numbers are for this exercise only:

Assumptions

300 million monthly active users.
50% of users use Twitter daily.
Users post 2 tweets per day on average.
10% of tweets contain media.
Data is stored for 5 years.

Estimations

Query per second (QPS) estimate

Daily active users (DAU) = 300 million * 50% = 150 million
Tweets QPS = 150 million DAU * 2 tweets / 24 hour / 3600 seconds = ~3500
Peak QPS = 2 * QPS = ~7000

Media storage estimate

Average tweet size:
- tweet_id 64 bytes
- text 140 bytes
- media 1 MB
Media storage: 150 million DAU * 2 tweets * 10% * 1MB = 30TB per day
5-year media storage: 30 TB * 365 * 5 = ~55PB

Tips

Solving the problem is more important than obtaining results:

Rounding and Approximation. There is no need to spend valuable time to solve complicated math problems. Precision is not expected. Use round numbers and approximation.
Write down your assumptions.
Label your units. You might get confused if you only write down "5". Does it mean 5 KB or 5 MB?
Commonly asked estimations: QPS, peak QPS, storage, cache, number of servers, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Back-of-the-envelope Estimation

Overview

Must know

Power of Two

Latency Numbers

Availability Numbers

Example: Estimate Twitter QPS and storage requirements

Assumptions

Estimations

Query per second (QPS) estimate

Media storage estimate

Tips

Files

README.md

Latest commit

History

README.md

File metadata and controls

Back-of-the-envelope Estimation

Overview

Must know

Power of Two

Latency Numbers

Availability Numbers

Example: Estimate Twitter QPS and storage requirements

Assumptions

Estimations

Query per second (QPS) estimate

Media storage estimate

Tips