Back of the envelope estimations
# Agenda:
- Overview
- Common inputs in a requests-per-second (RPS) calculation
- Example of RPS / QPS calculation
- Techniques to simplify the calculations
- Handy conversion Table
- Capacity or Storage Planning
- Conclusion
# Overview:
Back-of-the-envelope math is a very useful tool in our system design toolbox. In this article, we will go over how and when to use it, and share some tips on using it effectively.
Let’s dive right in.
Experienced developers use back-of-the-envelope math to quickly sanity-check a design.
In these cases, absolute accuracy is not that important. Usually, it is good enough to get within an order of magnitude or two of the actual numbers we are looking for.
For example, if the math says at our scale our web service needs to handle 1M requests per second, and each web server could only handle about 10K requests per second, we learn two things quickly:
1) We learn that we will need to cluster of web servers, with a load balancer in front of them.
2) We will need about 100 web servers.
Another example is if the math shows that the database needs to handle about 10 queries per second at peak, it means that a single database server could handle the load for a while, and there is no need to consider sharding or caching for a while.
# Why it's called back-of-envelope?
- because, it often involves using rough, simplified numbers and basic math on a scrap piece of paper or envelope to get a quick idea of whether something is feasible or reasonable.
Now let’s go over some of the most popular numbers to estimate.
The most useful by far is,
- Requests per second (RPS) at the service level OR
- Queries per second (QPS) at the database level.
# Common inputs in a requests-per-second (RPS) calculation:
1) DAU:
- The first input is DAU, or Daily Active Users. This number should be easy to obtain.
- Sometimes, the only available number would be Monthly Active Users.
- In that case, estimate the DAU as a percentage of MAU.
2) Usage per DAU:
- The second input is the estimate of the usage per DAU of the service we are designing for.
- For example, not everyone active on Twitter makes a post. Only a percentage does that.
- 10%-25% seems to be reasonable.
- Again, it doesn’t have to be exact.
- Getting within an order of magnitude is usually fine.
3) Scaling Factor (Peaks & Valleys):
- The third input is a scaling factor.
- The usage rate for a service usually has peaks and valleys throughout the day.
- We need to estimate how much higher the traffic would peak compared to the average.
- This would reflect the estimated requests-per-second peak where the design could potentially break.
- For example, for a service like Google Maps, the usage rate during commute hours could be 5 times higher than average.
- Another example is a ride-sharing service like Uber, where weekend nights could have twice as many rides as average.
# Example of RPS / QPS calculation:
We will estimate the number of Tweets created per second on Twitter. Note that these numbers are made up, and they are not official numbers from Twitter.
1) DAU:
Let’s assume Twitter has 300 million MAU, and 50% of the MAU use Twitter daily. So that’s 150 million DAU.
2) Usage per DAU:
Next, we estimate that about 25% of Twitter DAU make tweets. And each one on average makes 2 tweets. That is 25% * 2 = 0.5 tweets per DAU.
3) Scaling Factor
For the scaling factor, we estimate that most people tweet in the morning when they get up and can’t wait to share what they dreamed about the night before. And that spikes the tweet creation traffic to twice the average when the US east coast wakes up.
Now we have enough to calculate the peak tweets created per second.
We have: 150 million DAU times 0.5 tweet per DAU, times 2x scaling factor divided by 86,400 seconds in a day.
That is roughly about 1,500 tweets created per second.
# Techniques to simplify the calculations:
1) Convert Big numbers to power of 10s:
- First, we convert all big numbers to scientific notation. Doing the math on really big numbers is very error-prone.
- By converting big numbers to scientific notation, part of the multiplication becomes simple addition, and division becomes subtraction.
- In the example above, 150 million DAU becomes 150 * 10^6 OR 1.5 * 10^8.
2) 10^5 seconds in a day
There are 86,400 seconds in a day, we round it up to 100,000 seconds, and that becomes 10^5.
- And since it’s a division, 10^5 becomes 10^(-5).
3) Group all 10s together
- Next, we group all the power of tens together, and then all the other numbers together.
- So the math becomes:
1.5 x 0.5 x 2
And
10^8 x 10^(-5) = 10^(8-5) = 10^3
Putting it all together, it’s 1.5 x 10^3, or 1,500.
Now with practice, we should be able to convert a large number to scientific notation in seconds.
# Handy Conversion Table:
And here are some handy conversions we should memorize:
- As an example, we should know by heart that 10^12 is a trillion or a TB, and when we see a number like 50TB, we should be able to convert it quickly to 5x10^1x10^12, which is 5x10^13.
- We are going to ignore the fact that 1KB is actually 2^10 bytes, or 1,024 bytes, and not a thousand bytes.
- We don’t need that degree of accuracy for back-of-the-envelope math.
# Capacity / Storage Planning:
- We’ll estimate how much storage is required for storing multimedia files for tweets. We know from the previous example that there are about 150M tweets per day.
- Now we need an estimate on a percentage of tweets that could contain multimedia content, and how large those files are on average.
- With our meticulous research, we estimate that 10% of tweets contain pictures, and they are about 100KB each, and 1% of all the tweets contain videos, and they are 100MB each.
- We further assume that the files are replicated, with 3 copies each, and that Twitter would keep the media for 5 years.
- Now here is the math:
For storing pictures, we have the following:
150M tweets x 0.1 in pictures x 100KB per picture x 400 days in a year x 5 years * 3 copies
So, that turns into:
1.5 x 10^8 x 10^(-1) x 10^5 x 4x10^2 x 5 x 3
Again, we group the powers of tens together.
This becomes:
1.5 x 4 x 5 x 3, which is 90 and
10^(8-1+5+2), which is 10^14 and
that becomes 9 x 10^15, which is, from the table, 9 PB.
HENCE, 9PB STORAGE WOULD BE REQUIRED FOR PICTURES.
- For storing videos:
- we take yet another shortcut: Since videos on average are 100MB each while pictures are 100KB, a video is 1000 times bigger than a picture on average.
- Second, only 1% of tweets contain a video, while pictures appear in 10% of all the tweets. So videos are one-tenth as popular.
- Putting the math together, the total video storage is 1000 x 1/10 of picture storage, which is 100 x 9PB, or 900 PB.
HENCE, 900 PB STORAGE WOULD BE REQUIRED FOR VIDEOS.
# Conclusion:
- Don’t over-index on precision.
- Getting within an order of magnitude is usually enough to inform and validate our design.
Comments
Post a Comment