Software Engineer Hub

System Design for Software Engineers (2026)

In short

System design is the dominant senior+ SWE interview format and the rubric most engineers fail by sounding generic. The bar in 2026 is concrete: name the workload, do the back-of-envelope math, sketch the data model, pick a partitioning strategy with a reason, and articulate the trade-off you ruled out. This page walks three canonical problems end-to-end (URL shortener, Twitter timeline, ride-hailing dispatch) and names the Hello Interview, DDIA, and Spanner-paper sources behind each decision.

Key takeaways

  • Senior+ system design is judged on trade-off articulation, not architecture-diagram completeness — Hello Interview's 'Delivery' section explicitly weights the discussion of alternatives ruled out (hellointerview.com/learn/system-design/in-a-hurry/delivery).
  • Capacity estimation in interviews is roughly 5 minutes — practice peak QPS, storage in 5 years, and bandwidth math until they take 30 seconds each (Alex Xu, System Design Interview Vol 1, ch. 2).
  • SQL vs NoSQL in 2026 is mostly answered by access pattern, not scale — Postgres at single-leader scales to ~50k QPS reads on commodity hardware before sharding (DDIA ch. 5, p. 152-156).
  • Leader-follower replication is the default; leaderless (Cassandra/DynamoDB style) only when last-write-wins or CRDTs match the workload (DDIA ch. 5, p. 177-184).
  • Sharding strategy is hash vs range vs directory — pick one and name the hot-spot mitigation (Twitter sharded by user_id with consistent hashing; Foursquare moved off range-based when MongoDB hot-spotted in 2010).
  • The single most-failed signal at senior+ is candidates who design happy paths only. Failure-mode discussion — leader election during partition, partial writes, retry storms — is graded heavily.

Problem 1: Design a URL shortener (TinyURL, bit.ly)

This is the canonical 'first system design' interview because the functional surface is small but the design space is rich. Hello Interview rates it 'easy' on its difficulty curve (hellointerview.com/learn/system-design/problem-breakdowns/bitly) and uses it as the on-ramp problem in their senior track.

Functional requirements (clarify in 2 minutes):

  • POST /shorten { long_url, optional custom_alias, optional ttl } returns short_url.
  • GET /{short_code} 301-redirects to the long URL.
  • Analytics: per-link click count, geo, referrer (asynchronous).
  • Custom aliases must be globally unique; collisions return 409.

Capacity estimation (back-of-envelope, 5 min):

Writes:    100M new URLs/month = ~40 writes/sec average, ~400 peak
Reads:     100:1 read-to-write ratio (industry standard) = 4k QPS avg, 40k peak
Storage:   100M * 12 months * 5 years * 500 bytes/row = ~3 TB at 5-year horizon
Bandwidth: 4k QPS * 500 bytes = 2 MB/s read, negligible write
Cache:     Pareto says 20% of URLs serve 80% of clicks
           Hot set ~ 20M URLs * 500 bytes = 10 GB (fits one Redis box)

Short-code generation — three real options, rank them:

  1. Hash the URL (MD5 / xxHash) and base62-encode. Deterministic, but collisions need handling. Reject: more complex than option 3 with no benefit at this scale.
  2. Random base62 string of length 7. 62^7 = 3.5 trillion keys. Birthday collision probability stays under 1% until ~150M URLs — fine for first year. Reject: probabilistic, requires retry-on-collision.
  3. Counter + base62 encode. Centralized monotonic counter (Redis INCR or Snowflake/Sonyflake ID). Deterministic, no collisions, predictable URL length. Trade-off: leaks volume to competitors. Use this; mention the trade-off.

Data model (Postgres, single-leader at this scale):

CREATE TABLE links (
  short_code  VARCHAR(10) PRIMARY KEY,
  long_url    TEXT NOT NULL,
  user_id     BIGINT,
  created_at  TIMESTAMPTZ DEFAULT NOW(),
  expires_at  TIMESTAMPTZ,
  click_count BIGINT DEFAULT 0  -- denormalized; updated async
);
CREATE INDEX idx_links_user ON links(user_id);
-- click events go to a separate write-optimized store (Kafka -> ClickHouse)

Architecture sketch:

[Client] -> [CDN/Edge] -> [API Gateway] -> [Shortener Service]
                                              |
                                  +-----------+----------+
                                  |                      |
                            [Postgres primary]    [Redis cache LRU]
                                  |
                            [Postgres replica(s)]
                                  |
                            [Kafka click events]
                                  |
                            [ClickHouse analytics]

Scaling story (the part candidates skip): at 4k QPS read average, a single Postgres primary with two read replicas behind a load balancer is fine until ~50k peak QPS (DDIA ch. 5, p. 152-156, on single-leader limits). Past that, shard by short_code prefix or use Vitess. The 90% case never needs sharding; say that out loud in the interview. Promotion rubric weights candidates who right-size; over-engineering ('let's start with Cassandra') is a junior signal.

Trade-off table (force the interviewer to pick):

ChoiceSQL (Postgres)NoSQL (DynamoDB)
Read latency p99~5-15 ms (with cache miss to disk)~10-20 ms single-digit at any scale
Operational costLower at <1 TB; you manage primary/replicaHigher; pay per request, scales without ops
Schema flexibilityMigrations neededSchemaless, easier to evolve
Strong consistencyDefaultOptional (consistent reads cost 2x)

For this problem, Postgres wins on cost and operational simplicity. Pick it; defend it.

Problem 2: Design a Twitter timeline (fanout-on-write vs fanout-on-read)

This is the senior+ canonical because the answer changes based on user distribution. Hello Interview's twitter breakdown (hellointerview.com/learn/system-design/problem-breakdowns/tweet-search) and DDIA chapter 1, page 11-15, both walk it; if you only read one, read DDIA — Kleppmann's '13 million followers' worked example is what this problem tests.

Functional requirements: post a tweet (280 chars), follow/unfollow, view home timeline (tweets from people you follow, reverse chrono).

Capacity (rough, 2026 numbers from X transparency reports):

Users:        ~600M monthly active
Writes:       ~500M tweets/day = ~6k tweets/sec average, ~30k peak
Reads:        Home timeline ~10x writes = ~60k QPS average, ~300k peak
Fanout fan:   Average 200 followers; celebs have 50M-100M
Storage:      500M tweets/day * 1 KB * 365 days * 5 years = ~900 TB

Two architectural extremes — both are wrong; the answer is both:

  1. Fanout-on-write (push): when user X tweets, push the tweet ID into the timeline cache of every follower. Read = O(1) cache lookup. Write = O(followers); breaks for celebrities (Lady Gaga has ~80M followers as of 2024 — single tweet writes 80M cache entries).
  2. Fanout-on-read (pull): on home-timeline read, query the most recent tweets from each followee, merge in time order. Write = O(1). Read = O(followees * tweets_per_followee); breaks for power users following thousands.

Production answer (Twitter's actual approach, per their 2010-2013 engineering blog posts and DDIA p. 12): hybrid. Fanout-on-write for the majority of users (median follower count ~200, cheap fan); fanout-on-read for celebrities (their tweets are merged into the follower's timeline at read time). The threshold is empirical — Twitter used ~1M followers as the cutoff in their 2013 talk; today it's tracked dynamically.

Data model sketch:

-- Tweets in a write-optimized store (Manhattan / Cassandra-like)
CREATE TABLE tweets (
  tweet_id    BIGINT,    -- Snowflake ID, time-sortable
  user_id     BIGINT,
  body        TEXT,
  created_at  TIMESTAMPTZ,
  PRIMARY KEY (user_id, tweet_id)  -- partitioned by user_id
);

-- Per-follower timeline cache (Redis-like)
-- Key: timeline:{user_id}, Value: ZSET of (tweet_id, created_at)
-- Capped at ~800 most recent (covers 95% of read patterns)

Sharding strategy: tweets sharded by user_id (consistent hashing — DDIA ch. 6, p. 203-207). Timeline cache sharded by user_id too (co-located with the user). Hot-spot risk: a viral tweet (Elon, Taylor Swift) gets fan-out written to millions of timelines in seconds. Mitigation: rate-limit fan-out workers, use a separate priority queue for celebrities, eventually-consistent reads (the reader sees the tweet within seconds, not milliseconds).

What gets you to the 'strong hire' bar: articulate the failure mode. 'When the celebrity fanout queue backs up, we degrade gracefully — followers see the tweet on the next pull-based merge, with a 10-30s delay. We do not drop tweets.' That sentence is what Hello Interview's senior rubric calls 'production-grade trade-off articulation.'

Problem 3: Design ride-hailing dispatch (Uber/Lyft)

The hardest of the three because the workload is geospatial and real-time. Hello Interview ranks this 'hard' on its senior track (hellointerview.com/learn/system-design/problem-breakdowns/uber); Alex Xu Vol 2 chapter 1 walks the matching algorithm in detail.

Functional requirements: rider requests ride (pickup + dropoff coords); system finds nearest 5 drivers within 5 km; driver accepts; tracking until completion.

Capacity (rough, from Uber engineering blog 2023):

Active drivers:  ~5M concurrent globally
Location pings:  every 4s per driver = ~1.25M writes/sec
Ride requests:   ~25 per sec average per major city; ~500 peak
Storage hot:     5M drivers * 200 bytes (lat/lng/state) = ~1 GB live state

The geospatial-index decision: three real options.

  1. Geohash strings. Encode lat/lng to a 12-char hash. Prefix-match for nearby boxes. Failure mode: cell boundaries — two drivers 10m apart can hash to different cells, missing matches. Mitigation: query the cell + 8 neighbors. Use this for the cache layer.
  2. Quadtree. Recursive 2D-space subdivision. Rebalances poorly on hotspots. Reject: the live workload has hotspots (downtown SF Friday night).
  3. S2 cells (Google library). Hilbert-curve based; balanced; used in production at Uber. Open-source: github.com/google/s2geometry. This is the production answer.

Architecture:

[Driver app] -- 4s ping --> [WebSocket gateway] --> [Location Service]
                                                       |
                                              [Redis geo-index]
                                              (S2 cell -> driver_id list)
                                                       |
[Rider app] -> [Match Service] -> queries Redis for cell + neighbors
                       |
                       v
              [Match algorithm] -> [Notification Service]

The 'leaderless vs leader-follower' choice: the location index needs read-after-write within ~1s and high write throughput. Leaderless (Cassandra/Dynamo style) gives high availability under partition; leader-follower gives stronger consistency. Uber uses Ringpop (their custom consistent-hash-with-gossip layer over Redis) — it's leaderless for writes within a cell, with the cell sharded by S2 ID. DDIA ch. 5, p. 177-184, walks the trade-off; for this interview, it's enough to say 'leaderless, eventually consistent within ~1 second is correct here because rider tolerance for staleness is high — they'd rather see any driver than wait for a strongly consistent read.'

Failure modes (the senior+ part):

  • Driver app loses network for 30s. Their last-known location is stale. Match service must check ping recency and exclude drivers with ping > 15s old.
  • Two riders match the same driver simultaneously. Solve with a distributed lock (Redis SETNX on driver_id with TTL); first writer wins.
  • Surge in one cell. 100 ride requests hit one S2 cell at once. Solve with backpressure: queue requests, accept up to driver-count, return 'no drivers available' fast on overflow.

Trade-off framework: how to talk about decisions

The single signal that distinguishes a 'strong hire' from a 'lean hire' on system design is structured trade-off articulation. Hello Interview's senior rubric (hellointerview.com/blog/system-design-rubric) lists four dimensions you should hit on every major decision:

DimensionWhat to say
LatencyQuote a target (e.g., 'p99 read < 50ms') and explain why your choice meets it.
ThroughputQuote QPS at peak and headroom (e.g., 'Postgres handles ~50k QPS reads on db.r6g.4xlarge per AWS benchmarks').
ConsistencyState strong / read-after-write / eventual and why the workload tolerates it.
CostApproximate dollars or relative ('DynamoDB at 4k QPS costs ~$2k/mo on-demand; managed Postgres ~$500/mo').

SQL vs NoSQL — say this out loud: 'SQL is the default; I switch to NoSQL when (a) the access pattern is partition-key + sort-key with no ad-hoc queries, or (b) write throughput exceeds single-leader limits, or (c) the team needs schema-less evolution faster than migrations allow.' That sentence — direct from DDIA ch. 3, p. 70-72 — is what passes the senior bar. The wrong answer: 'NoSQL because it scales.' Postgres scales fine for the workloads 95% of interview problems describe.

Leader-follower vs leaderless — say this: 'Leader-follower is correct when I need read-after-write consistency and writes are below ~50k QPS. Leaderless (Dynamo, Cassandra) wins when (a) writes exceed single-leader, (b) tolerance for stale reads is high, (c) availability under partition matters more than consistency.' DDIA ch. 5, p. 177-184, is the canonical reference; the Spanner paper (research.google/pubs/spanner) is what to cite when an interviewer pushes on global consistency.

Sharding strategy — three options to name:

  • Hash-based: uniform distribution; loses range queries. Use for user_id, tweet_id, link_id.
  • Range-based: preserves range queries; risks hot-spots on monotonically increasing keys. Use for time-series with care (always salt the prefix).
  • Directory-based (lookup table): flexibility for migrations; adds a hop. Use when shards rebalance often (e.g., per-customer in B2B SaaS).

Foursquare's 2010 outage (foursquare.com/about/blog) is the canonical 'why range-based on monotonic keys is wrong' case study — name it in the interview if asked about hot-spots.

Preparation cadence: 8-12 weeks to senior-bar fluency

The candidates who clear FAANG senior system design rounds in 2026 do not start with Hello Interview's problem list. They start with DDIA, get the foundations cold, then drill problems.

  1. Weeks 1-3 — Foundations (DDIA ch. 1-7, p. 1-242). Read for understanding, not for memorization. Make notes on: replication trade-offs (ch. 5), partitioning strategies (ch. 6), transactions and isolation levels (ch. 7).
  2. Weeks 4-6 — Problem-pattern recognition. Hello Interview's Core 12 problems (URL shortener, Twitter, Uber, WhatsApp, YouTube, Dropbox, Tinder, Yelp, Stock Exchange, Web Crawler, Distributed Cache, Rate Limiter). For each, draw the architecture without aid, then watch the Hello Interview walkthrough and diff.
  3. Weeks 7-10 — Mock interviews. Hello Interview's mock pool, Pramp, or interviewing.io. Aim for 12+ mocks with senior+ FAANG interviewers. The pattern recognition only consolidates under time pressure.
  4. Weeks 11-12 — Specialty depth. Pick the company you're targeting and read their engineering blog. Stripe (stripe.com/blog/engineering), Uber (uber.com/blog/engineering), Meta (engineering.fb.com), Discord (discord.com/blog/engineering). One real production system from each, deeply.

The two highest-leverage hours of prep are: (1) reading DDIA chapter 5 twice; (2) doing one mock with a current FAANG senior+ as the interviewer. Everything else is supporting.

Frequently asked questions

Should I memorize architecture diagrams for the Core 12 problems?
No. Memorized diagrams collapse the moment the interviewer perturbs the requirements ('what if writes are 10x reads?'). Memorize the trade-off framework — latency / throughput / consistency / cost — and re-derive the architecture under their specific constraints. Hello Interview's senior rubric explicitly penalizes pattern-matched answers with no derivation.
When does an interviewer expect me to introduce a message queue (Kafka, SQS)?
Three triggers: (1) async work that doesn't need to block the user response (analytics, email, notifications); (2) decoupling fast producers from slow consumers (write-heavy ingestion); (3) replay/audit log for events. If your design has none of those, leave the queue out. Adding Kafka 'because scale' without naming a trigger is a junior signal.
Do I need to know specific database internals (B-trees, LSM trees) for senior interviews?
Not in the first 30 minutes. You need them when the interviewer pushes on write amplification or read-vs-write trade-offs. The 30-minute version: Postgres uses B-trees (read-optimized, writes amplify). Cassandra/RocksDB use LSM trees (write-optimized, reads merge). DDIA ch. 3, p. 78-86, is the reference.
How much do I need to know about consensus algorithms (Raft, Paxos)?
Conceptually, deeply. Implementation, no. You should be able to draw a 5-node Raft cluster, walk through leader election, and explain why an even number of nodes is dangerous (split-vote). The Raft paper itself (raft.github.io/raft.pdf) is more readable than DDIA's coverage; spend an hour on it.
When is it correct to say 'this needs Spanner'?
When you need strong external consistency across geographic regions and you're at scale where TrueTime's atomic clock infrastructure pays off. Almost never in an interview. The right answer for global consistency in a normal interview is 'I'd use single-region with cross-region replicas, accept the regional-failover blast radius, and not pretend to need Spanner-level guarantees unless the problem demands it.' Reference: Corbett et al, 'Spanner: Google's Globally Distributed Database', OSDI 2012.
How do I handle 'how would this evolve in 5 years' questions?
Two parts: (1) what about the workload would change (10x users, new feature like search) and (2) which architectural decision becomes wrong. Best answer: 'At 10x scale, the single Postgres primary I started with becomes the bottleneck — write QPS exceeds ~50k per single-leader limits. The migration path is Vitess for horizontal sharding by user_id, with the schema unchanged. The rewrite is operational, not application-layer.' Naming the migration path is what passes.
What's the difference between fanout-on-write and write-through cache?
Fanout-on-write replicates a single piece of data to many consumer-specific stores (Twitter timeline cache per follower). Write-through cache replicates the same data to a faster tier (Redis in front of Postgres). Different problems: fanout solves read amplification; write-through solves read latency. Conflating them is a common interview mistake.
Should I draw on a whiteboard or use Excalidraw / Hello Interview's tool?
Use whatever the interviewer offers; the diagram quality is not graded. What's graded: do you label data flow direction, do you name protocols (HTTP/gRPC/WebSocket), do you mark where state lives. Drawing speed matters more than aesthetics — practice until you can sketch the URL-shortener architecture in 90 seconds.

Sources

  1. Hello Interview — URL Shortener problem breakdown (canonical senior+ on-ramp problem).
  2. Hello Interview — Ride-hailing dispatch breakdown.
  3. Hello Interview — Delivery rubric (how senior+ system design is scored).
  4. Designing Data-Intensive Applications (Kleppmann), Chapter 5: Replication, p. 152-184.
  5. Designing Data-Intensive Applications, Chapter 6: Partitioning, p. 199-228.
  6. Corbett et al — 'Spanner: Google's Globally Distributed Database', OSDI 2012.
  7. Ongaro & Ousterhout — 'In Search of an Understandable Consensus Algorithm' (Raft), USENIX ATC 2014.
  8. Google S2 Geometry library — production geospatial indexing used by Uber and Foursquare.

About the author. Blake Crosley founded ResumeGeni and writes about product design, hiring technology, and ATS optimization. More writing at blakecrosley.com.