Back of the envelope estimations

# Agenda:

  1. Overview
  2. Common inputs in a requests-per-second (RPS) calculation
  3. Example of RPS / QPS calculation
  4. Techniques to simplify the calculations
  5. Handy conversion Table
  6. Capacity or Storage Planning
  7. Conclusion

# Overview:

Back-of-the-envelope math is a very  useful tool in our system design toolbox. In this article, we will go over how and when to use  it, and share some tips on using it effectively.

Let’s dive right in.

Experienced developers use back-of-the-envelope  math to quickly sanity-check a design.

In these cases, absolute accuracy is not that important. Usually, it is good enough to get  within an order of magnitude or two of the actual numbers we are looking for.

For example, if the math says at our scale our web  service needs to handle 1M requests per second, and each web server could only handle about 10K  requests per second, we learn two things quickly:

    1) We learn that we will need to cluster of web  servers, with a load balancer in front of them.

    2) We will need about 100 web servers.

Another example is if the math shows that the database needs to handle about 10 queries per second at peak, it means that a single  database server could handle the load for a while, and there is no need to consider sharding or caching for a while.

# Why it's called back-of-envelope?

  • because, it often involves using rough, simplified numbers and basic math on a scrap piece of paper or envelope to get a quick idea of whether something is feasible or reasonable.

Now let’s go over some of the  most popular numbers to estimate.

The most useful by far is,

  • Requests per second (RPS) at the service  level  OR
  • Queries  per second (QPS) at the database level.


# Common inputs in a requests-per-second (RPS) calculation:

1) DAU:

  • The first input is DAU, or Daily Active Users. This number should be easy to obtain. 
  • Sometimes, the only available number  would be Monthly Active Users.  
  • In that case, estimate the  DAU as a percentage of MAU.


2) Usage per DAU:

  • The second input is the estimate of the usage per DAU of the service we are designing for.
  • For example, not everyone  active on Twitter makes a post. Only a percentage does that.
  • 10%-25% seems to be reasonable.
  • Again, it doesn’t have to be exact.
  • Getting within an order of  magnitude is usually fine.


3) Scaling Factor (Peaks & Valleys):

  • The third input is a scaling factor.
  • The usage rate for a service usually has peaks and valleys throughout the day.
  • We need to estimate how much higher the traffic would peak compared to the average.
  • This would reflect the estimated  requests-per-second peak where the design could potentially break.
  • For example, for a service like Google Maps, the usage rate during commute hours  could be 5 times higher than average.
  • Another example is a ride-sharing  service like Uber, where weekend nights could have twice as many rides as average.


# Example of RPS / QPS calculation:

We will estimate the number of  Tweets created per second on Twitter. Note that these numbers are made up, and  they are not official numbers from Twitter.

1) DAU: 

    Let’s assume Twitter has 300 million MAU,  and 50% of the MAU use Twitter daily. So that’s 150 million DAU.

2) Usage per DAU:

    Next, we estimate that about  25% of Twitter DAU make tweets. And each one on average makes 2 tweets. That is 25% * 2 = 0.5 tweets per DAU.

3) Scaling Factor

    For the scaling factor, we estimate  that most people tweet in the morning when they get up and can’t wait to share  what they dreamed about the night before. And that spikes the tweet creation traffic to  twice the average when the US east coast wakes up.

Now we have enough to calculate  the peak tweets created per second. 

We have: 150 million DAU times 0.5 tweet per DAU, times 2x  scaling factor divided by 86,400 seconds in a day.

That is roughly about 1,500  tweets created per second.


# Techniques to simplify the calculations:

1) Convert Big numbers to power of 10s:

  • First, we convert all big numbers to scientific notation. Doing the math on really big  numbers is very error-prone.
  • By converting big numbers to scientific notation,  part of the multiplication becomes simple  addition, and division becomes subtraction.
  • In the example above, 150 million  DAU becomes  150 * 10^6 OR 1.5 * 10^8.

2) 10^5 seconds in a day

There are 86,400 seconds in a day, we round it up to 100,000 seconds, and  that becomes 10^5.

  • And since it’s a division, 10^5 becomes 10^(-5).

3) Group all 10s together

  • Next, we group all the power of tens together, and then all the other numbers together.
  • So the math becomes:

1.5 x 0.5 x 2 

And

10^8 x 10^(-5) = 10^(8-5) = 10^3

Putting it all together, it’s 1.5 x 10^3, or 1,500.

Now with practice, we should be able to convert a large number to scientific notation in seconds. 


# Handy Conversion Table:

And here are some handy conversions we should memorize:

Number Unit

Memory Units

Scientific Notation

Thousand (K)

KB

10 ^ 3

Million (M)

MB

10 ^ 6

Billion (B)

GB

10 ^ 9

Trillion

TB

10 ^ 12

Quadrillion

PB

10 ^ 15

  • As an example, we should know by heart that 10^12 is a trillion or a TB, and when we see a number like 50TB, we should be able to convert  it quickly to 5x10^1x10^12, which is 5x10^13.
  • We are going to ignore the fact  that 1KB is actually 2^10 bytes, or 1,024 bytes, and not a thousand bytes.
  • We don’t need that degree of accuracy  for back-of-the-envelope math.


# Capacity / Storage Planning:

  • We’ll estimate how much storage is required  for storing multimedia files for tweets. We know from the previous example that there are about 150M tweets per day.
  • Now we need an estimate on a percentage of  tweets that could contain multimedia content, and how large those files are on average.
  • With our meticulous research, we estimate that  10% of tweets contain pictures, and they are about 100KB each, and 1% of all the tweets contain videos, and they are 100MB each.
  • We further assume that the files  are replicated, with 3 copies each, and that Twitter would keep the media for 5 years.
  • Now here is the math:

For storing pictures, we have the following:

150M tweets x 0.1 in pictures x 100KB per picture x 400 days in a year x 5 years * 3 copies

So, that turns into:

1.5 x 10^8 x 10^(-1) x 10^5 x 4x10^2 x 5 x 3

Again, we group the powers of tens together.

This becomes:

1.5 x 4 x 5 x 3, which is 90 and 

        10^(8-1+5+2), which is 10^14 and 

    that becomes 9 x 10^15, which is, from the table, 9 PB.

HENCE, 9PB STORAGE WOULD BE REQUIRED FOR PICTURES.

  • For storing videos: 
  • we take yet another shortcut: Since videos on average are 100MB each while pictures are 100KB, a video is 1000 times bigger than a picture on average.
  • Second, only 1% of tweets contain a video,  while pictures appear in 10% of all the tweets. So videos are one-tenth as popular.
  • Putting the math together, the total  video storage is 1000 x 1/10 of picture storage, which is 100 x 9PB, or 900 PB.

HENCE, 900 PB STORAGE WOULD BE REQUIRED FOR VIDEOS.


# Conclusion:

  • Don’t over-index on precision
  • Getting within an order of magnitude is usually  enough to inform and validate our design.


Comments

Popular posts from this blog

KMP Algorithm: Pattern Searching in Text

Z-Function Algorithm: Substring Search