Blog - 2: Scalability & Load Balancing

Have you ever built a system, launched it with excitement, and suddenly found it crawling as more users started pouring in? 🫣 I’ve been there, watching helplessly as increasing traffic turned a responsive app into a sluggish mess. That’s when I learned the importance of scalability and load balancing. These concepts are crucial if you want your application to handle an increasing number of users without breaking a sweat. Let’s dive into the basics of how to design systems that grow with your traffic and ensure users stay happy.

Scalability: Handling Growth Like a Pro

Scalability refers to a system’s ability to handle growing amounts of traffic or data by adding resources like servers or increasing the power of existing ones. Imagine running a pizza shop. You can either hire more staff as the number of customers increases (horizontal scalability) or upgrade your current staff’s efficiency with better tools (vertical scalability).

Vertical Scaling (Scaling Up)
This involves upgrading your server’s CPU, RAM, or storage. It’s like getting a bigger oven for your pizza shop. However, it’s limited by the maximum capacity of a single server.
Horizontal Scaling (Scaling Out)
Here, you add more servers to handle the traffic. Instead of relying on one massive oven, you add more ovens, each handling its share of the orders. This is more flexible and allows for greater expansion.

Example: Netflix uses horizontal scaling to manage its massive user base, distributing content across thousands of servers worldwide.

Understanding Latency, Throughput, and Responsiveness

Before diving into load balancing, it’s essential to understand three important metrics:

Latency: The time it takes for a request to travel from the user to the server and back. It’s like how long it takes for a pizza order to be delivered. Lower latency means faster delivery.
Throughput: The number of requests a system can handle per second. Think of it as how many pizzas your shop can make in an hour.
Responsiveness: This refers to how quickly the system responds to user requests. A responsive system is one that delivers pizzas to hungry customers without delay.

For example, Google ensures low latency and high throughput by deploying data centers around the world, ensuring fast responses no matter where you are.

Load Balancing: Distributing the Workload

Load balancing ensures that incoming requests are distributed evenly across multiple servers so no single server gets overwhelmed. It’s like having multiple chefs in your pizza shop, each making pizzas to avoid overloading one chef.

Round Robin

Each server takes turns handling requests. It’s as if you had a line of customers and each chef takes the next order in line. Simple and fair, but it doesn’t account for server load.

Least Connections

This strategy sends traffic to the server with the fewest active connections. Imagine one chef is almost done with an order while another is starting a large one. You would send the next customer to the chef with less work. This method ensures a more even distribution.

IP Hash

Requests from the same client IP are always sent to the same server. It’s like assigning a regular customer to a specific chef who knows their preferences.

Weighted Round Robin

Servers with more processing power receive more requests. It’s like having more experienced chefs handling a larger portion of the orders. This is particularly useful when your servers have varying capacities.

Geographic Load Balancing

This technique routes traffic based on the geographical location of the user. For example, a user in Europe will be connected to the nearest server in Europe. This reduces latency and improves user experience.

Application-Aware Load Balancing

In this strategy, the load balancer directs traffic based on the type of request. For instance, it might send video streaming requests to high-bandwidth servers while sending lightweight API calls to others.

Example: Amazon Web Services (AWS) uses sophisticated load balancing to manage the millions of transactions that happen on its cloud infrastructure, distributing workloads across multiple servers globally.

Real-World Example: Facebook

Facebook uses a combination of horizontal scaling and load balancing to handle billions of users daily. When you log into Facebook, load balancers ensure your request is directed to the least busy server, while data is fetched from multiple servers distributed around the globe. This ensures the system remains responsive, even during peak times.

Conclusion

Scalability and load balancing are vital components of any system designed to handle growth. Whether it’s adding more servers to accommodate increased traffic or distributing requests across multiple servers to ensure efficiency, these strategies keep your system running smoothly. So next time you build an application, remember that it’s not just about functionality—it’s about being ready for success.

FAQs

Q: What is the difference between vertical and horizontal scaling?
A: Vertical scaling upgrades the capacity of existing servers, while horizontal scaling adds more servers to distribute the load.
Q: How does load balancing improve system performance?
A: Load balancing ensures that no single server is overwhelmed by distributing requests across multiple servers.
Q: When should I use round-robin load balancing?
A: Round-robin works well when all servers have similar capacity and processing power.
Q: Why is latency important in system design?
A: Latency affects how quickly users receive responses, and reducing it ensures a better user experience.
Q: Can a system use both vertical and horizontal scaling?
A: Yes, many systems use a combination of both to maximise efficiency and handle various types of growth.

Blog – 2: Scalability & Load Balancing

Fundamentals of System Design

Vertical Scaling (Scaling Up)

Horizontal Scaling (Scaling Out)

Round Robin

Least Connections

IP Hash

Weighted Round Robin

Geographic Load Balancing

Application-Aware Load Balancing

Q: What is the difference between vertical and horizontal scaling?

Q: How does load balancing improve system performance?

Q: When should I use round-robin load balancing?

Q: Why is latency important in system design?

Q: Can a system use both vertical and horizontal scaling?