Backend Engineer Hub

Databases and Storage for Backend Engineers (2026)

In short

Databases and storage for backend engineers in 2026 means knowing four things deeply. First, the storage engine: B-tree (Postgres, MySQL, SQL Server) wins read-heavy OLTP; LSM-tree (Cassandra, RocksDB, ScyllaDB) wins write-heavy ingest. Second, indexing strategy: composite, partial, and covering indexes; left-prefix rules; index-only scans. Third, replication and partitioning: primary-replica vs multi-primary, range vs hash sharding, and the consistency trade-offs each topology imposes. Fourth, isolation levels and the OLTP vs OLAP boundary: when Postgres stops scaling and Snowflake or BigQuery takes over.

Key takeaways

  • B-tree storage engines (Postgres, MySQL InnoDB) optimize for read-heavy OLTP and arbitrary index lookups; LSM-tree engines (Cassandra, RocksDB, ScyllaDB) optimize for write-heavy workloads by buffering writes in memory and flushing as immutable SSTables. Designing Data-Intensive Applications Chapter 3 (dataintensive.net) is the canonical treatment.
  • Index design is the single highest-leverage backend skill. Composite indexes follow the left-prefix rule; partial indexes cover the slice you actually query; covering indexes (INCLUDE in Postgres) enable index-only scans by storing extra columns at the leaf level. Markus Winand's use-the-index-luke.com is the canonical reference.
  • EXPLAIN ANALYZE is the diagnostic loop: read the plan tree bottom-up, look for Seq Scan on large tables, check estimated rows vs actual rows (the planner lies when statistics are stale), and verify Index Cond / Filter rows go in the direction you expect. The Postgres docs (postgresql.org/docs/current/using-explain.html) are canonical.
  • Replication topologies trade availability for consistency. Single-primary asynchronous replication (default Postgres) gives high write throughput and eventual read consistency. Synchronous replication trades write latency for stronger guarantees. Multi-primary (Cassandra, DynamoDB) accepts conflict-resolution complexity in exchange for write availability across regions.
  • Partitioning by hash gives even distribution but makes range scans expensive; partitioning by range supports time-series queries naturally but creates hot partitions when the access pattern skews to recent data. The DynamoDB docs and DDIA Chapter 6 cover the trade-offs in production terms.
  • Transaction isolation levels are not abstract: read-committed (Postgres default) allows non-repeatable reads; repeatable-read prevents them but allows phantom reads; serializable (SSI in Postgres) prevents both at the cost of more aborts. Pick the weakest level that satisfies the invariant the code depends on.
  • OLTP and OLAP are different products. Postgres / MySQL handle transactional workloads up to roughly the working set fitting in RAM; once analytical queries scan billions of rows, move to Snowflake, BigQuery, or Databricks. Don't run analytical queries on the production OLTP replica without an explicit read-replica boundary.

Storage engines: B-tree vs LSM-tree, and when to choose each

Every relational and key-value database is built on a storage engine, and the storage engine's data structure determines what the database is good at. The two dominant designs are B-trees (and B+ trees) and log-structured merge trees (LSM-trees).

B-tree engines — Postgres (heap + B-tree indexes), MySQL InnoDB (clustered B-tree), SQL Server, SQLite — store data in pages on disk, organized as a balanced tree. Reads are O(log n) lookups with predictable latency. Updates happen in place: the page is read, modified, and written back. The trade-off is write amplification on heavy ingest (every update touches multiple pages and the WAL) and the need for vacuum or compaction to reclaim space from deleted rows.

LSM-tree engines — Cassandra, RocksDB, ScyllaDB, LevelDB, the storage layer under DynamoDB and Bigtable — buffer writes in an in-memory memtable, flush the memtable to immutable on-disk SSTables when it fills, and merge SSTables in the background (compaction). Writes are sequential and fast. Reads may need to consult multiple SSTables (mitigated by Bloom filters and tiered compaction), so read latency is less predictable than a B-tree.

The decision rule:

  • Read-heavy OLTP with arbitrary indexes (most web apps): B-tree wins. Postgres is the default in 2026 and the right answer for the vast majority of new backend projects.
  • Write-heavy ingest with predictable access patterns (time-series, event logs, IoT, telemetry): LSM-tree wins. Cassandra or ScyllaDB for wide-column; RocksDB as the embedded engine under your own service; ClickHouse for OLAP-style aggregations.
  • Massive scale with relaxed consistency (multi-region key-value): DynamoDB or Cassandra. The LSM design pairs naturally with eventual-consistency replication.

What this means for senior+ engineers in 2026: you can articulate which storage engine your database uses and why; you understand that "Postgres" means heap-storage + B-tree indexes, not a magical black box; you can read a query plan and recognize when the engine's constraints are showing up in a hot path. Designing Data-Intensive Applications Chapter 3 (dataintensive.net) and Alex Petrov's Database Internals Part I (databass.dev) are the canonical treatments.

Query optimization with EXPLAIN ANALYZE — a worked example

Query optimization in Postgres begins and ends with EXPLAIN ANALYZE. The command runs the query and returns the execution plan annotated with actual timings and row counts. The diagnostic loop: read the plan bottom-up, find the operations that dominate runtime, check whether the planner's estimated rows match actual rows (a large mismatch means stale statistics — run ANALYZE), and verify the index strategy.

Consider a real example: a orders table with 10 million rows, indexed on customer_id, and a query that fetches recent orders for a customer along with a status filter.

-- Query: recent shipped orders for one customer
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, total_cents, created_at
FROM orders
WHERE customer_id = 84219
  AND status = 'shipped'
  AND created_at >= NOW() - INTERVAL '30 days'
ORDER BY created_at DESC
LIMIT 50;

-- BEFORE: only an index on customer_id exists
-- Index Scan using idx_orders_customer on orders
--   (cost=0.43..18342.10 rows=2400 width=24)
--   (actual time=0.041..212.847 rows=147 loops=1)
--   Index Cond: (customer_id = 84219)
--   Filter: ((status = 'shipped') AND (created_at >= now() - '30 days'))
--   Rows Removed by Filter: 8431
--   Buffers: shared hit=2104 read=312
-- Planning Time: 0.184 ms
-- Execution Time: 213.022 ms

The plan reveals the problem: the engine reads 8,578 rows via the customer_id index, then filters 8,431 of them away in memory. The 2,416 buffer accesses are mostly wasted. The fix is a composite index on the columns the query actually filters and orders by:

-- Add a composite index matching the query's WHERE + ORDER BY
CREATE INDEX CONCURRENTLY idx_orders_customer_status_created
  ON orders (customer_id, status, created_at DESC)
  INCLUDE (total_cents);

-- AFTER: re-run the same query
-- Index Only Scan using idx_orders_customer_status_created on orders
--   (cost=0.43..4.62 rows=50 width=24)
--   (actual time=0.029..0.184 rows=50 loops=1)
--   Index Cond: ((customer_id = 84219) AND (status = 'shipped')
--               AND (created_at >= now() - '30 days'))
--   Heap Fetches: 0
--   Buffers: shared hit=8
-- Planning Time: 0.151 ms
-- Execution Time: 0.218 ms

The plan changed from Index Scan + Filter reading thousands of heap rows to Index Only Scan reading eight buffer pages. Two patterns made the difference: the composite index puts customer_id first (left-prefix rule), then status for filtering, then created_at DESC so the ORDER BY + LIMIT is satisfied directly from the index. The INCLUDE (total_cents) clause stores the projected column at the leaf level, enabling a true index-only scan with zero heap fetches.

What this code gets right: CREATE INDEX CONCURRENTLY avoids locking the table during the build; column order matches the query's filter + sort; INCLUDE covers the projection without bloating the B-tree key. What to watch for: index-only scans require the visibility map to be up-to-date (VACUUM maintains it); on a write-heavy table, frequent vacuums are needed for the optimization to actually fire. The Postgres docs (postgresql.org/docs/current/using-explain.html) and Markus Winand's use-the-index-luke.com (use-the-index-luke.com) are the canonical references.

Replication topologies and consistency trade-offs

Replication exists for two reasons: durability (a copy survives the loss of one node) and read scaling (multiple replicas serve reads in parallel). Every topology trades availability for consistency, and the right choice depends on the invariant the application enforces.

Single-primary asynchronous replication is the default Postgres and MySQL setup. The primary accepts all writes and streams the WAL to one or more replicas. Replicas serve reads with a small lag (typically tens of milliseconds, sometimes seconds under load). The trade-off: a write that committed on the primary may not yet be visible on a replica, so a read-after-write on a replica can return stale data. Mitigations include routing reads to the primary for the first N seconds after a write, or using synchronous commit on critical replicas.

Single-primary synchronous replication trades write latency for stronger durability guarantees: the primary waits for at least one replica to acknowledge before confirming a commit. Postgres supports this via synchronous_commit = on with synchronous_standby_names. The cost is write latency that includes the network round-trip to the standby; the benefit is that a primary failure cannot lose acknowledged writes.

Multi-primary replication (Cassandra, DynamoDB, CockroachDB, Spanner-style architectures) accepts writes at any node and reconciles concurrently. Cassandra uses last-write-wins with vector clocks and tunable consistency (read-quorum, write-quorum); DynamoDB uses managed conflict resolution; Spanner uses Paxos with TrueTime to provide external consistency. The trade-off: write availability across regions in exchange for the operational complexity of conflict resolution and weaker isolation by default. Spanner's globally-distributed architecture (research.google/pubs/spanner-googles-globally-distributed-database) is the canonical treatment.

The decision rule for new backend systems in 2026:

  • Single region, < 100k writes/s: Postgres with one or two async replicas. Add synchronous replication only on critical financial paths.
  • Multi-region with eventual consistency tolerable: Cassandra or DynamoDB. Design the data model around the access patterns; eventual consistency is acceptable if the application reads its own writes from the same node.
  • Multi-region with strong consistency required: Spanner or CockroachDB. Pay the latency cost in exchange for serializable isolation across regions.

Designing Data-Intensive Applications Chapter 5 (Replication) is the canonical treatment of the trade-offs. The AWS Builders' Library (aws.amazon.com/builders-library/) contains production essays on consistency models in DynamoDB and Aurora.

Partitioning strategies for horizontal scale

Partitioning (also called sharding) splits a large dataset across multiple nodes or storage units so each piece fits within a single node's capacity. Two strategies dominate, and the choice determines what queries are cheap and what queries are expensive.

Hash partitioning distributes rows evenly across partitions by hashing the partition key. It eliminates hotspots — if the key is well-chosen, every partition gets roughly the same load. The cost is range scans: a query like WHERE created_at BETWEEN x AND y must consult every partition because the rows for any given range are scattered across all of them.

Range partitioning assigns a contiguous key range to each partition (e.g., orders from 2024 in one partition, 2025 in another). Range scans are cheap because the rows live together. The cost is hotspots when the access pattern skews to recent data — the partition holding "today" gets all the writes while older partitions sit cold.

A canonical comparison:

Strategy           | Even distribution | Range scans | Hot-partition risk
-------------------|-------------------|-------------|--------------------
Hash               | Yes               | Expensive   | Low
Range (time)       | No (skewed)       | Cheap       | High (recent data)
Range (composite)  | Sometimes         | Cheap       | Mitigated
Directory / lookup | Operator-managed  | Either      | Operator-managed

Postgres supports declarative partitioning natively. A typical time-series setup partitions by month for an events table:

-- Parent table — partitioned by RANGE on created_at
CREATE TABLE events (
    id          BIGSERIAL,
    user_id     BIGINT NOT NULL,
    event_type  TEXT NOT NULL,
    payload     JSONB NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);

-- Monthly partitions
CREATE TABLE events_2026_04 PARTITION OF events
    FOR VALUES FROM ('2026-04-01') TO ('2026-05-01');
CREATE TABLE events_2026_05 PARTITION OF events
    FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');

-- Indexes are defined on the parent and propagate to partitions
CREATE INDEX ON events (user_id, created_at DESC);

-- Drop old partitions in O(1) — no DELETE, no vacuum churn
DROP TABLE events_2025_01;

What this DDL gets right: PARTITION BY RANGE on a monotonic timestamp lets the planner do partition pruning (only scan the partition containing the queried range); per-partition indexes are smaller and faster to maintain than one giant index; old data is dropped by detaching or dropping a partition, which is O(1) and avoids the vacuum churn of DELETE.

For globally distributed key-value stores, DynamoDB's partition model is canonical reading. The DynamoDB partitioning docs (docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.Partitions.html) describe how partition keys map to physical partitions and why a high-cardinality partition key is essential to avoid hot partitions. DDIA Chapter 6 (Partitioning) treats the design space in full.

Isolation levels, ACID vs BASE, and the OLTP / OLAP boundary

Transaction isolation levels are not theoretical — they are the contract between the database and the code that depends on it. Senior+ backend engineers can name the four SQL standard levels, the anomalies each prevents, and which one their production database actually uses by default.

  • Read uncommitted — the weakest level. Reads can see uncommitted data from other transactions (dirty reads). No production system should run here; Postgres treats this level as identical to read-committed.
  • Read committed — the Postgres and Oracle default. A statement sees only data committed before the statement started. Prevents dirty reads but allows non-repeatable reads (the same row can change between two reads in the same transaction) and phantom reads (new rows can appear in a re-executed range query).
  • Repeatable read — the MySQL InnoDB default. A transaction sees a consistent snapshot taken at the start; rows read once will not change during the transaction. Postgres's repeatable-read uses snapshot isolation; it prevents non-repeatable reads but does not prevent write skew.
  • Serializable — the strongest level. Postgres implements this as Serializable Snapshot Isolation (SSI), which detects and aborts transactions that would produce a non-serializable history. Write skew and phantom reads are prevented; the cost is more aborted transactions under contention.

The decision rule: pick the weakest level that satisfies the invariant the code depends on. Read-committed is correct for the vast majority of CRUD endpoints. Serializable is required when the correctness of the application depends on a multi-row constraint that can't be expressed as a database constraint (e.g., "at most three doctors on call per shift").

ACID vs BASE. ACID (Atomicity, Consistency, Isolation, Durability) describes the transactional guarantees of relational databases. BASE (Basically Available, Soft state, Eventual consistency) describes the relaxed-consistency model adopted by Cassandra, DynamoDB, and many distributed key-value stores. The trade-off is the CAP theorem in operational form: ACID systems sacrifice availability under partition; BASE systems sacrifice consistency to remain available. Most modern systems are tunable — DynamoDB and Cassandra offer per-request consistency; Spanner offers ACID across regions at the cost of latency.

OLTP vs OLAP. The other major architectural boundary is between transactional (OLTP) and analytical (OLAP) workloads. OLTP databases (Postgres, MySQL, SQL Server, DynamoDB) handle high-frequency small transactions: insert one row, update one row, fetch one customer's recent orders. OLAP systems (Snowflake, BigQuery, Databricks, ClickHouse, DuckDB) handle low-frequency analytical queries that scan billions of rows: weekly revenue by product category, cohort retention by signup month, top-100 sessions by event count.

Don't run analytical queries on the production OLTP replica. The right architecture in 2026:

  • OLTP layer: Postgres or MySQL, partitioned and replicated, serving the live application.
  • CDC pipeline: Debezium, Fivetran, or AWS DMS streams change events from the OLTP database to an OLAP warehouse.
  • OLAP layer: Snowflake, BigQuery, or Databricks, with dbt or SQLMesh for transformations and a BI tool (Looker, Hex, Metabase) on top.

Designing Data-Intensive Applications Chapters 7 (Transactions) and 10 (Batch and Stream Processing) cover the trade-offs in full. The senior+ bar in 2026: you can name the isolation level your production database uses, you understand which anomalies it prevents, and you have an explicit boundary between your OLTP and OLAP systems.

Frequently asked questions

When should I pick Postgres over MySQL in 2026?
Postgres is the default for new projects in 2026. It has stronger SQL compliance, better JSON support (JSONB with GIN indexes), richer indexing (partial, expression, GIN, GiST, BRIN), Serializable Snapshot Isolation by default, and a more mature ecosystem of extensions (pgvector, PostGIS, TimescaleDB). MySQL still wins for simple workloads on managed services and for teams with deep MySQL operational experience, but the gap closed years ago and Postgres is the conservative choice today.
When does an LSM-tree database beat Postgres?
When write throughput is the bottleneck and the access pattern is well-defined. Time-series ingest at 100k+ writes/second per node (Cassandra, ScyllaDB, ClickHouse), embedded key-value storage under your own service (RocksDB), or globally distributed key-value at managed scale (DynamoDB). If your queries are arbitrary ad-hoc SQL with multiple JOINs, the LSM advantage evaporates and Postgres wins.
What's the difference between an Index Scan and an Index Only Scan?
An Index Scan reads the index to find matching row pointers, then fetches the rows from the heap. An Index Only Scan reads the index alone and returns results directly from the index — no heap access. Index Only Scans require all selected columns to be present in the index (use INCLUDE in Postgres) and a sufficiently up-to-date visibility map (maintained by VACUUM). When you see one switch to the other, you've gained a major perf improvement.
How do I avoid hot partitions in DynamoDB or Cassandra?
Pick a high-cardinality partition key that distributes load evenly. A user-id partition key works when traffic is distributed across users; a date partition key creates a hot partition for today's writes. Composite partition keys (user-id + bucket-suffix) help when a single hot key dominates. The DynamoDB partitioning docs cover the design space; the rule of thumb is that no single partition should receive more than ~10 percent of total writes.
What isolation level should I use for a new application?
Start with the database default — read-committed in Postgres, repeatable-read in MySQL InnoDB. Move to Serializable only when application correctness depends on a multi-row invariant that can't be expressed as a database constraint. The cost of Serializable is more transaction aborts under contention; the cost of getting isolation wrong is silent data corruption. Pick deliberately, document the choice in the codebase, and write a regression test that exercises the concurrent path.
When does Postgres stop scaling and I need a warehouse?
Two thresholds: when the working set no longer fits in RAM (analytical scans go to disk and dominate the I/O budget), and when analytical queries start affecting OLTP latency. The right answer is to push analytics off the OLTP path entirely — set up a CDC pipeline (Debezium, Fivetran) to a warehouse (Snowflake, BigQuery, Databricks) and run analytical work there. ClickHouse or DuckDB are good lighter-weight options for teams that want to avoid the warehouse cost.
Should I use a NoSQL database for a new web app in 2026?
Usually no. Postgres handles JSON natively (JSONB with GIN indexes), supports schemaless workflows when needed, and gives you ACID transactions for free. NoSQL wins specific shapes: globally distributed key-value with managed scale (DynamoDB), wide-column at petabyte scale (Cassandra, Bigtable), document storage with rich aggregation (MongoDB). For a typical SaaS web app, Postgres is the default and NoSQL is a deliberate exception requiring justification.
What's the right way to do schema migrations in production?
Backward-compatible migrations only. Every migration must be safe under a rolling deploy where old and new code run simultaneously. Add columns nullable; backfill in a separate migration; drop columns in a third migration after the code that referenced them is fully removed. Use CREATE INDEX CONCURRENTLY to avoid locking. Tools like Sqitch, Flyway, or Alembic enforce ordered migrations; the AWS Builders' Library has production essays on safe schema evolution.
How do I read a Postgres EXPLAIN plan effectively?
Read bottom-up: the deepest nodes execute first. Look for Seq Scan on large tables (often a missing index), check estimated rows vs actual rows (a 10x mismatch means stale statistics — run ANALYZE), watch for Sort or Hash Aggregate spilling to disk (work_mem too low), and prefer Index Only Scan over Index Scan over Bitmap Heap Scan over Seq Scan in that order. The Postgres docs (postgresql.org/docs/current/using-explain.html) and Markus Winand's use-the-index-luke.com are the canonical references.

Sources

  1. Designing Data-Intensive Applications (Martin Kleppmann) — Chapters 3 (Storage and Retrieval), 5 (Replication), 6 (Partitioning), 7 (Transactions). The canonical reference for backend storage systems.
  2. Database Internals (Alex Petrov) — Part I covers B-tree and LSM-tree storage engines in depth. Companion to DDIA at the implementation level.
  3. PostgreSQL Documentation — Using EXPLAIN. Canonical reference for query plan analysis.
  4. Use the Index, Luke! (Markus Winand) — definitive guide to SQL indexing across Postgres, MySQL, Oracle, and SQL Server.
  5. AWS DynamoDB Developer Guide — Partitions and Data Distribution. Canonical reference for hash partitioning in a managed key-value store.
  6. Spanner: Google's Globally-Distributed Database (Google Research). Canonical paper on globally consistent multi-primary replication with TrueTime.
  7. AWS Builders' Library — production engineering essays on consistency models, replication, and operational practices for DynamoDB, Aurora, and other managed databases.

About the author. Blake Crosley founded ResumeGeni and writes about backend engineering, hiring technology, and ATS optimization. More writing at blakecrosley.com.