[DevOps] System Design assignment

Hello, I’m WONIZZ, the developer of organizing things. In this session, we are going to organize the contents related to the architecture required by many companies.
This is a summary of arhitting for url shortner, which usually requires a lot.

I wrote the contents that I actually did the assignment myself, and I can learn many things while doing the assignment, so I post it to record it on my blog.

1. Traffic and System Capacity

Traffic

Assuming 200:1 read/write ratio
Number of unique shortened links generated per month = 100 million
Number of unique shortened links generated per seconds = 100 million /(30 days * 24 hours * 3600 seconds ) ~ 40 URLs/second
With 200:1 read/write ratio, number of redirections = 40 URLs/s * 200 = 8000 URLs/s

Storage

Assuming lifetime of service to be 50 years and with 100 million shortened links creation per month

total number of data points/objects in system will be = 100 million/month * 50 (years) * 12 (months) = 60 billion

Assuming size of each data object (Short url, long url, created date etc.) to be 500 bytes long

total require storage = 60 billion * 500 bytes =30TB

Memory

Following Pareto Principle, better known as the 80:20 rule for caching. (80% requests are for 20% data)

Since we get 8000 read/redirection requests per second, we will be getting 700 million requests per day:

8000/s * 86400 seconds =~700 million

To cache 20% of these requests, we will need ~70GB of memory.

0.2 * 700 million * 500 bytes = ~70GB

Estimates

Category	Estimation
Shortened URLs generated	40/s
Shortened URLs generated in 50 years	60 billion
Total URL requests/redirections	8000/s
Storage for 100 years	30TB
Cache memory	70GB

2. Pseudo Code

Java cod for generating 7 characters long tiny url with a counter

base 62 are [0–9][a-z][A-Z]

URL with length 5, will give 62⁵ = ~916 Million URLs
URL with length 6, will give 62⁶ = ~56 Billion URLs
URL with length 7, will give 62⁷ = ~3500 Billion URLs

Since we required to produce 120 billion URLs, with 7 characters in base62 we will get ~3500 Billion URLs. Hence each of tiny url generated will have 7 characters

public class Codec {
Long counter = 1L;
Map<Long, String> indexToUrl = new HashMap<Long, String>();
Map<String, Long> urlToIndex = new HashMap<String, Long>();
String base62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";

  // Encodes a URL to a shortened URL.
  public String encode(String longUrl) {
      if (urlToIndex.containsKey(longUrl)) {
          return "http://tinyurl.com/"+base62Encode(urlToIndex.get(longUrl));
      }
      else {
          indexToUrl.put(counter, longUrl);
          urlToIndex.put(longUrl, counter);
          counter++;
          return "http://tinyurl.com/"+base62Encode(urlToIndex.get(longUrl));
      }
  }

  // Decodes a shortened URL to its original URL.
  public String decode(String shortUrl) {
      String base62Encoded = shortUrl.substring(shortUrl.lastIndexOf("/") + 1);
      long decode = 0;
      for(int i = 0; i < base62Encoded.length(); i++) {
          decode = decode * 62 + base62.indexOf("" + base62Encoded.charAt(i));
      }
      return indexToUrl.get(decode);
  }

  private String base62Encode(long value) {
      StringBuilder sb = new StringBuilder();
      while (value != 0) {
          sb.append(base62.charAt((int)(value % 62)));
          value /= 62;
      }
      while (sb.length() < 6) {
          sb.append(0);
      }
      return sb.reverse().toString();
  }
}

start with a counter (A Large number 100000000000 in base 10 which 1L9zO9O in base 62) and increment counter every-time we get request for new short url

3. Database Schema

We can use a relational database as our backend store but since we don’t have joins between objects and we require huge storage size (60TB) along with high writes (40 URL’s per seconds) and read (8000/s) speed, a NoSQL store (MongoDB, Cassandra etc.) will be better choice.

Data Related to user

Name	Type	Desc
User ID	Int	A unique user id or API key to make user globally distinguishable
Name	Varchar	The name of the user
Email	Varchar	The email id of the user
Creation Date	Datetime	The date on which the user was registered

Data Related to ShortLink

Name	Type	Desc
Short URL	Varchar	7 character long unique short URL
Original URL	Varchar	The original long URL
User ID	Int	The unique user id or API key of the user who created the short URL

We can use a relational database as our backend store but since we don’t have joins between objects and we require huge storage size (60TB) along with high writes (40 URL’s per seconds) and read (8000/s) speed, a NoSQL store (AWS DynamoDB, Cassandra etc.) will be better choice.

4. Key Generation Service (KGS)

a standalone Key Generation Service (KGS) that generates random seven-letter strings beforehand and stores them in a databas
Whenever we want to shorten a URL, we will take one of the already-generated keys and use it. This approach will make things quite simple and fast.

concurrency problem solution

As soon as KGS gives keys to one of the servers, it can move them to the used keys table. KGS can always keep some keys in memory to quickly provide them whenever a server needs them.

SPOF of KGS problem solution

To solve this, we can have a standby replica of KGS. Whenever the primary server dies, the standby server can take over to generate and provide keys.

5. Cache

We can use some off-the-shelf solution like Memchached, Before hitting backend storage, the application servers can quickly check if the cache has the desired URL.
As estimated above, we need 70GB memory to cache 20% of daily traffic. Since a modern-day server can have 256GB memory, we can easily fit all the cache into one machine.

Whenever there is a cache miss, our servers would be hitting a backend database. Whenever this happens, we can update the cache and pass the new entry to all the cache replicas

cache policy

Least Recently Used (LRU) can be a reasonable policy for our system. Under this policy, we discard the least recently used URL first.

To further increase the efficiency, we can replicate our caching servers to distribute the load between them.

6. Load Balancer ( LB )

Between Clients and Application servers
Between Application Servers and database servers ( AWS Managed )
Between Application Servers and Cache servers ( AWS Managed )

Pros of LB

Balancing of Traffic( distributes incoming requests equally among backend servers. )
if a server is dead, LB will take it out of the rotation and will stop sending any traffic to it.
If a server is overloaded or slow, the LB will not stop sending new requests to that server

7. System Design

system_design_for_urlshortner[/caption]

8. Conclusion

While carrying out this assignment, I learned a lot of things to consider in various system design situations.
In particular, I was able to learn about the concerns and the reasons for the design, such as rough traffic requirements, cache layer, and LB selection.

[DevOps] System Design assignment

1. Traffic and System Capacity

Traffic

Storage

Memory

Estimates

2. Pseudo Code

Java cod for generating 7 characters long tiny url with a counter

3. Database Schema

4. Key Generation Service (KGS)

concurrency problem solution

SPOF of KGS problem solution

5. Cache

cache policy

6. Load Balancer ( LB )

Pros of LB

7. System Design

8. Conclusion

이것이 좋아요:

관련

Published by 워니

답글 남기기 응답 취소

1. Traffic and System Capacity

Traffic

Storage

Memory

Estimates

2. Pseudo Code

Java cod for generating 7 characters long tiny url with a counter

3. Database Schema

4. Key Generation Service (KGS)

concurrency problem solution

SPOF of KGS problem solution

5. Cache

cache policy

6. Load Balancer ( LB )

Pros of LB

7. System Design

8. Conclusion

이 글 공유하기:

이것이 좋아요:

관련

Published by 워니

답글 남기기 응답 취소