Skip to main content

System Design resources

![[system-design-map.png]]

System Design Interview: A Step-By-Step Guide

https://www.youtube.com/watch?v=i7twT3x5yv8

understand the problem and establish design scope (5 min)

  • clarify requirements
  • why are we building this
  • who are the users
  • what features do we need to build
    • get interviewer buy in on feature list
  • non-functional
    • focus on scale and performance
    • do rough calculations
    • get general sense of scale
  • should end with short list of features and a few non-functional requirements to satisfy

propose high level design and get buy in (20 min)

  • top down, start with APIs
    • use REST
    • define input parameters and output responses carefully
    • verify they satisfy functional requirements
  • design a diagram
    • used to verify design satisfies requirements end to end
    • start with load balancer or API gateway
    • behind that is the services that satisfy the requirements
    • behind that is persistence so introduce data storage
    • do this for each requirement
    • keep a list of conversation topics for later (scaling, concurrency, failure scenarios)
  • create data model and schema
    • data access patterns and read/write ratio
  • step back and review the design

design deep dive (15 min)

  • identify problematic areas and discuss trade offs
  • determine with interview what to discuss in depth
  • non-functional requirements go here
  • ask interviewer if they have any concerns about current design
  • for each area
    • articulate the problems
    • come up with 2 solutions
    • discuss tradeoffs of the solutions
    • pick a solution and discuss

wrap up (5 min)

  • summarize the design
  • note parts that are unique to this particular situation

How to Answer System Design Interview Questions

https://www.youtube.com/watch?v=L9TfZdODuFQ

define the problem space

  • define the scope
  • ask lots of questions to narrow the scope
  • clarify functional and non-functional requirements
  • functional
    • what's in and out of scope?
    • state assumptions
    • is this from scratch?
    • who are the clients or users?
    • do we need to talk to existing pieces of the system?
    • what are those pieces?
  • non-functional
    • business objectives
    • user experience
    • availability, consistency, speed, security, reliability
    • cost
  • focus on the ones you think are most critical
  • estimate the amount of data you're dealing with
    • storage size
    • bandwidth requirements
    • this can help you choose components and give an idea of what scaling might look like later
    • make some assumptions about user volume and typical behavior
    • check if these match interviewer's expectations

design the system at a high level

  • lay out the most fundamental pieces of the system and illustrate how they work together to achieve desired functionality
  • no nitty-gritty
  • start by designing APIs
    • each system requirement should translate to one or more APIs
    • use REST, SOAP, graphql, etc
    • consider request parameters and response type
    • these form the foundation of the architecture
  • client/web server communication
  • create a high level diagram
    • show how the data and control flow looks in this system
    • no scalability yet

deep dive into the design

  • examine system components and relationships in more detail
  • start by talking about how non-functional requirements impact your design choices
  • this is where you start adding load balancers, database partitioning

identify bottlenecks and scaling opportunities

  • examine system for ability to operate under various conditions and has room to support growth
  • is there a single point of failure? what can we do to support robustness and enhance the system's availability?
  • is the data valuable enough to require replication? how important is it to keep all versions consistent?
  • is the service global? do we need to provide multi-geo data centers to improve locality?
  • are there edge cases like peak time usage or hot users that have usage patterns that could deteriorate the performance?
  • how do we scale the system to support 10x more users?
  • this is where horizontal sharding, CDNs, caching, rate limiting knowledge is useful

review and wrap up

  • summarize major decisions with justifications
  • summarize any trade-offs in space, time, complexity
  • check that design satisfies all requirements
  • identify potential areas for improvement

how i mastered system design interviews

https://www.youtube.com/watch?v=l3X1t3kpmwY

what are the requirements of the system? who are the users and how many? what components do we need in our system? how should these components be organized? how do we make the system scalable? how to make the system reliable? how to make it easy to maintain?

key concepts

  • scalability
    • how well a system can handle more users or data without slowing down
    • vertical - adding resources like bigger hard drive, more memory
    • horizontal - adding more machines to the system to handle the load
  • performance
    • how fast your system works
    • latency - time it takes for a single task
    • throughput - how many tasks your system can handle in a certain time
  • availability
    • making sure the system is up and running when users need it without significant downtime
  • reliability
    • system is doing what it's supposed to be doing even when things go wrong
    • replication
    • redundancy
    • failover mechanisms
  • consistency
    • all users see the same data at the same time no matter which part of the system they interact with
    • can slow down performance
    • eventual consistency - the data may not be up to date immediately but will be after a specific time
  • cap theorm
    • in a distributed system you can only have 2 of these 3 things:
      • consistency
      • availability
      • partition tolerance
    • you need to make trade offs based on what the system needs the most
  • data storage and retrieval
    • choosing the database
    • designing the schema
    • partitioning, sharding, replication
    • for optimal storage and retrieval
  • ACID transactions
    • atomicity
    • consistency
    • isolation
    • durability
    • a way to make sure everything we do in a database is done right and reliably
  • consistence hashing
    • used to spread data across a group of servers
    • makes it easier to add and remove servers with minimal disruptions
    • load balancing, scalability
  • rate limiting
    • controls the rate clients can make requests to the system
    • prevent abuse
    • protect against DDOS attacks
    • ensure fair use of resources
  • networking and communication
    • how different parts of a system communicate
    • network protocols
    • APIs
    • message queues
    • event-driven architecture
  • security and privacy
    • putting in place methods to keep important data safe and stop unwanted access
    • authentication
    • authorization
    • encryption

building blocks

  • application servers
    • computers that handle the business logic and processing required by the application
  • load balancers
    • distribute incoming requests to different serves to ensure no single server gets overwhelmed
  • databases
    • data storage
    • there are different types to serve different needs
    • common: SQL and NoSQL
  • caching
    • store frequently accessed data in a fast access storage to reduce load on the primary data source and improve response times
  • message queues
    • enable asynchronous communication between system components
    • decouple sender and receiver and allow them to work independently at different rates
  • storage
    • store and retrieve data such as files, images or videos
    • local file systems
    • distributed files systems
    • object storage systems (S3)
  • proxy server
    • acts as an intermediary between client and server
    • can be used for things like load balancing, caching, security, or content filtering
  • CDN
    • content delivery network distributed globally
    • stores copies of website content
    • serves up local instance of data

system design interview

  • clarify requirements
    • functional and non-functional
  • estimate the capacity the system is dealing with
  • choose the right database and define the schema
  • design APIs and request/response pattern
    • define endpoints and parameters
  • sketch out a high level block diagram
    • identify major components
  • deep dive into key components and discuss how components interact
    • Common Areas for Deep Dives:
      • Databases: How would you handle a massive increase in data volume? Discuss sharding (splitting data across multiple databases), replication (read/write replicas).
      • Web Servers/Application Servers: How do you add more servers behind the load balancer for increased traffic?
      • Load Balancers: Which Load Balancing techniques and algorithms to use (e.g., round-robin, least connections).
      • Caching: Where do you add more cache layers (in front of web servers? in the application layer?), and how do you deal with cache invalidation?
      • Single Points of Failure: Identify components whose failure would take down the system and discuss how to address it.
      • Authentication/Authorization: How do you manage user access and permissions securely?
      • Rate Limiting: How do you prevent excessive use or abuse of your APls?
  • discuss how system will scale under load
    • sharding
    • replication
    • partitioning
  • discuss tradeoffs
    • sql vs nosql
  • discuss caching strategies and where they can be added
  • discuss strategies for handling failures
    • replicas
    • fallbacks
    • retries

System Design Concepts Course and Interview Prep

https://www.youtube.com/watch?v=F2FmTdLtb_4

Good design

  • scalability
  • maintainability
  • efficiency
  • reliability

key elements

  • moving data
    • ensure data can move between parts of the system
    • user requests or database transfers
    • optimize for speed and security
  • storing data
    • not just sql or nosql
    • access patterns
    • indexing strategies
    • backup solutions
    • optimize for security and availability
  • transforming data
    • turning data into meaningful information

CAP or Brewer's theorem

you can only have two of the three at the same time best solution for the specific use case where can we afford to compromise?

  • consistency
    • the same data is available to every user
  • availability
    • the system is always available to requests
    • bulwarks
      • reliability
      • fault tolerance
      • redundancy
  • partition tolerance
    • the system is able to function even when a partition occurs

speed

  • throughput
    • server - RPS
    • database - QPS
    • data - B/s
  • latency
    • how long it takes to handle a single request
  • shorter latency equals longer throughput and vice versa

API design

  • defining inputs (product details from a seller)
  • defining outputs (information when user queries for a product)
  • CRUD
    • create - post
    • read - get
    • update - put
    • delete - delete
  • paradigms
    • REST
      • stateless
      • can result in over/under fetching data
    • graphQL
      • strongly typed, get only what you need
      • queries can impact server performance
      • only post requests
    • gRPC
      • efficient
      • less human readable
  • ensure backwards compatibility
  • set rate limiter and CORS

Caching

  • browser
  • server
  • database
  • CDN

Databases

  • types
    • relational
      • tables
      • SQL query language
      • great for transactions, complex queries, integrity
      • ACID compliant
    • NoSQL
      • drops the Consistency property from ACID
      • it's flexible, can add and remove in any order
      • no schema
      • ideal for scalability, quick iteration, simple queries
    • in memory
      • fast data retrieval
      • caching, session storage
  • scaling
    • vertical "scale up"
      • increase CPU power
      • add more RAM
      • add more disk storage
      • upgrade network
      • limited by how many resources you can add
    • horizontal "scale out"
      • add more machines
      • database sharding
        • vertical or horizontal
        • breaks data into smaller chunks and spreads it out across the system
        • range, directory, geographical
      • data replication
        • keep copies of data on multiple servers
        • master (read write) / slave (read only)
        • master master (both read and write)
  • performance
    • caching
      • cache data or frequent queries
    • indexing
      • boosts performance by indexing frequently access columns
    • query optimization
      • minimize joins
      • use analyzers to understand query performance

Google system design interview: Design Spotify

https://www.youtube.com/watch?v=_K-eupuDVEc

constrain the problem

  • reduce the scope to something that can be designed within the hour

calculate the amount of data you're dealing with

  • specify some key metrics to help high level decision making

lay out the basic components of the design

  • device -> load balancer -> webserver -> database & storage (metadata vs blobs)

do basic database design

  • split out DBs for different types of data

add a cache where appropriate

  • analytics to ensure popular content is readily available via CDN
  • how does caching apply at different levels?

load balancing

  • make sure server isn't overloaded
  • look at different metrics to apply the right approach

talk through use cases

  • check your design
  • where are there bottlenecks?

refine the design

  • scaling and replication
    • geolocated for more performant access
  • wrap up by outlining how it meets requirements
  • think big and add a new dimension

other resources