[DevOps] System Design assignment
Hello, I’m WONIZZ, the developer of organizing things. In this session, we are going to organize the contents related to the architecture required by many companies.
This is a summary of arhitting for url shortner, which usually requires a lot.
I wrote the contents that I actually did the assignment myself, and I can learn many things while doing the assignment, so I post it to record it on my blog.
1. Traffic and System Capacity
Traffic
- Assuming 200:1 read/write ratio
- Number of unique shortened links generated per month = 100 million
- Number of unique shortened links generated per seconds = 100 million /(30 days * 24 hours * 3600 seconds ) ~ 40 URLs/second
- With 200:1 read/write ratio, number of redirections = 40 URLs/s * 200 = 8000 URLs/s
Storage
Assuming lifetime of service to be 50 years and with 100 million shortened links creation per month
- total number of data points/objects in system will be = 100 million/month * 50 (years) * 12 (months) = 60 billion
Assuming size of each data object (Short url, long url, created date etc.) to be 500 bytes long
- total require storage = 60 billion * 500 bytes =30TB
Memory
Following Pareto Principle, better known as the 80:20 rule for caching. (80% requests are for 20% data)
Since we get 8000 read/redirection requests per second, we will be getting 700 million requests per day:
- 8000/s * 86400 seconds =~700 million
To cache 20% of these requests, we will need ~70GB of memory.
- 0.2 * 700 million * 500 bytes = ~70GB
Estimates
Category | Estimation |
---|---|
Shortened URLs generated | 40/s |
Shortened URLs generated in 50 years | 60 billion |
Total URL requests/redirections | 8000/s |
Storage for 100 years | 30TB |
Cache memory | 70GB |
2. Pseudo Code
Java cod for generating 7 characters long tiny url with a counter
base 62 are [0–9][a-z][A-Z]
URL with length 5, will give 62⁵ = ~916 Million URLs
URL with length 6, will give 62⁶ = ~56 Billion URLs
URL with length 7, will give 62⁷ = ~3500 Billion URLs
Since we required to produce 120 billion URLs, with 7 characters in base62 we will get ~3500 Billion URLs. Hence each of tiny url generated will have 7 characters
public class Codec {
Long counter = 1L;
Map<Long, String> indexToUrl = new HashMap<Long, String>();
Map<String, Long> urlToIndex = new HashMap<String, Long>();
String base62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
// Encodes a URL to a shortened URL.
public String encode(String longUrl) {
if (urlToIndex.containsKey(longUrl)) {
return "http://tinyurl.com/"+base62Encode(urlToIndex.get(longUrl));
}
else {
indexToUrl.put(counter, longUrl);
urlToIndex.put(longUrl, counter);
counter++;
return "http://tinyurl.com/"+base62Encode(urlToIndex.get(longUrl));
}
}
// Decodes a shortened URL to its original URL.
public String decode(String shortUrl) {
String base62Encoded = shortUrl.substring(shortUrl.lastIndexOf("/") + 1);
long decode = 0;
for(int i = 0; i < base62Encoded.length(); i++) {
decode = decode * 62 + base62.indexOf("" + base62Encoded.charAt(i));
}
return indexToUrl.get(decode);
}
private String base62Encode(long value) {
StringBuilder sb = new StringBuilder();
while (value != 0) {
sb.append(base62.charAt((int)(value % 62)));
value /= 62;
}
while (sb.length() < 6) {
sb.append(0);
}
return sb.reverse().toString();
}
}
start with a counter (A Large number 100000000000 in base 10 which 1L9zO9O in base 62) and increment counter every-time we get request for new short url
3. Database Schema
We can use a relational database as our backend store but since we don’t have joins between objects and we require huge storage size (60TB) along with high writes (40 URL’s per seconds) and read (8000/s) speed, a NoSQL store (MongoDB, Cassandra etc.) will be better choice.
Data Related to user
Name | Type | Desc |
---|---|---|
User ID | Int | A unique user id or API key to make user globally distinguishable |
Name | Varchar | The name of the user |
Varchar | The email id of the user | |
Creation Date | Datetime | The date on which the user was registered |
Data Related to ShortLink
Name | Type | Desc |
---|---|---|
Short URL | Varchar | 7 character long unique short URL |
Original URL | Varchar | The original long URL |
User ID | Int | The unique user id or API key of the user who created the short URL |
We can use a relational database as our backend store but since we don’t have joins between objects and we require huge storage size (60TB) along with high writes (40 URL’s per seconds) and read (8000/s) speed, a NoSQL store (AWS DynamoDB, Cassandra etc.) will be better choice.
4. Key Generation Service (KGS)
a standalone Key Generation Service (KGS) that generates random seven-letter strings beforehand and stores them in a databas
Whenever we want to shorten a URL, we will take one of the already-generated keys and use it. This approach will make things quite simple and fast.
concurrency problem solution
As soon as KGS gives keys to one of the servers, it can move them to the used keys table. KGS can always keep some keys in memory to quickly provide them whenever a server needs them.
SPOF of KGS problem solution
To solve this, we can have a standby replica of KGS. Whenever the primary server dies, the standby server can take over to generate and provide keys.
5. Cache
We can use some off-the-shelf solution like Memchached, Before hitting backend storage, the application servers can quickly check if the cache has the desired URL.
As estimated above, we need 70GB memory to cache 20% of daily traffic. Since a modern-day server can have 256GB memory, we can easily fit all the cache into one machine.
Whenever there is a cache miss, our servers would be hitting a backend database. Whenever this happens, we can update the cache and pass the new entry to all the cache replicas
cache policy
Least Recently Used (LRU) can be a reasonable policy for our system. Under this policy, we discard the least recently used URL first.
To further increase the efficiency, we can replicate our caching servers to distribute the load between them.
6. Load Balancer ( LB )
- Between Clients and Application servers
- Between Application Servers and database servers ( AWS Managed )
- Between Application Servers and Cache servers ( AWS Managed )
Pros of LB
- Balancing of Traffic( distributes incoming requests equally among backend servers. )
- if a server is dead, LB will take it out of the rotation and will stop sending any traffic to it.
- If a server is overloaded or slow, the LB will not stop sending new requests to that server
7. System Design
system_design_for_urlshortner[/caption]
8. Conclusion
While carrying out this assignment, I learned a lot of things to consider in various system design situations.
In particular, I was able to learn about the concerns and the reasons for the design, such as rough traffic requirements, cache layer, and LB selection.