Skip to content

Instantly share code, notes, and snippets.

@VenkataSakethDakuri
Last active November 22, 2025 13:56
Show Gist options
  • Select an option

  • Save VenkataSakethDakuri/a1bae5ed7ab08ff5bdc5840fd01954fa to your computer and use it in GitHub Desktop.

Select an option

Save VenkataSakethDakuri/a1bae5ed7ab08ff5bdc5840fd01954fa to your computer and use it in GitHub Desktop.
System Design
HORIZONTAL Scaling vs VERTICAL Scaling
1. LOAD BALANCING REQUIRED, 1. N/A.
2. RESILIENT, 2. Single point of failure.
3. Network calls (RPC), 3. Inter process Communication.
4. DATA INCONSISTENCY., 4. Consistent
5. SCALES WELL. AS USERS INCREASE, 5. Hardware limit.
Tradional vs Consistent hashing:
The problem with tradional hashing is that whenever we change the no. of servers ( up or down ), lot of data has to be reshuffled across the available servers. This is inefficient, for ex: we may have some cache stored in a particular server and that may become obsolete.
In consistent hashing we have a abstract ring where both servers and requests are mapped ( the hash function for both can be same or different ). For assigning requests to servers we use clockwise rule. Now when we increase or decrease servers only the 2 points in the ring where the server is mapped and the clockwise next point is affected and the traffic gets redistributed, rest all servers are unaffected.
We can further improve the system by using virtual nodes, to avoid cases of some servers handling a large portion of requests due to non uniform layout. Each V-Node acts as a replica of the physical server, allowing the keys to be spread much more evenly across the entire ring and providing better load balance and resilience.
Message queue is a asynchronous way to process requests between producers and consumers.
1. Producer (Sender)
The producer is the application component that creates and publishes the message to the queue.
The producer does not need to know where, when, or how the message will be processed. It simply drops the message into the queue and immediately moves on to its next task.
2. Queue (Broker)
The queue ensures the message is persisted until a consumer successfully processes it.
3. Consumer (Receiver)
The consumer is the application component that connects to the queue and retrieves messages.
When a message is retrieved, the consumer processes it (e.g., updates a database, sends an email).
The consumer typically sends an acknowledgment (ACK) back to the queue to confirm successful processing, after which the queue deletes the message.
The queue can also have a heartbeat and load balancing mechanism.
Monolithic architecture is simple and straightforward for smaller applications or teams. The entire system is packaged as a single code base, making deployment and testing easy at first. Developers need to understand the whole codebase when making changes, which can be time-consuming and challenging for newcomers. Every change—even to a small feature—requires redeploying the full application. If a bug or crash occurs, the entire system may go down because all logic is tightly coupled. Also, scaling is less flexible: you must scale the whole application even if only one part faces heavy traffic.
Microservice architecture breaks down the system into separate, independent services that communicate via APIs. This structure enables teams to work on services separately using different technologies. Developers can onboard more quickly because they only need to understand and work on isolated modules. Services can be deployed individually, allowing downtime or changes in one module without affecting the rest. You can scale just the services that need it, without scaling the whole system. However, the architecture and infrastructure are more complex. Managing many small services adds overhead for monitoring, testing, and deployment, and you need reliable API contracts between modules. If done poorly, it can lead to too many tiny services and unnecessary complexity.
Consistency trumps availability in most cases in databases.
Joins Across Shards: Expensive & slow.
Flexibility: Hard to increase/decrease number of shards with simple partitioning.
Consistent Hashing helps balance data when you add/remove servers.
Hierarchical Sharding: If a shard is too large, split it further and add a manager to route requests.
Indexing is usually done within each shard to boost performance.
Cache:
Eventual consistency: Cache may not always be updated instantly, leading to stale reads (especially problematic for financial transactions).
Cache placement options:
In-memory with server apps (fastest access)
Directly in the database (small built-in caches)
Global, distributed cache servers (scalable, reusable by multiple services)
In real production systems, usually all 3 options are used together.
To avoid SPOF we usually use multiple server instances, backups of data, multiple load balancers, geographical distribution etc.
A CDN is a globally distributed network of servers (cache servers) that delivers static content (e.g., images, HTML, videos) to users from locations geographically close to them. It Reduces latency by serving content from local servers instead of a central server.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment