Backend Engineer Hub

API Design and Microservices for Backend Engineers (2026)

In short

API design at tech-company backend teams in 2026 means contract-first OpenAPI specs, idempotency keys on every state-changing endpoint, explicit versioning policy, and graceful deprecation. Microservices are extracted only when the team boundary, scaling profile, or data ownership justifies the operational tax. Resilience is non-negotiable: circuit breakers, bulkheads, retry-with-exponential-backoff-and-jitter, and timeouts on every call. The senior bar: you design APIs that are idempotent by construction, versioned by contract, observable by default, and fail in predictable ways under partial outage.

Key takeaways

  • Idempotency keys are the single most-cited API-design pattern: a client-supplied key that lets the server safely de-duplicate retried writes. Stripe's pattern (stripe.com/blog/idempotency) is the canonical reference: store the key + response for a fixed window (24 hours typical), return the cached response on retry, and key by request body hash to detect mismatched retries.
  • API versioning is a product decision, not a technical one. The two dominant strategies are URL versioning (/v1/, /v2/) and date-based versioning (Stripe's 2020-08-27 header). URL versioning is simpler; date-based versioning lets each customer pin to a specific behavior. Stripe's case study (stripe.com/blog/api-versioning) is the canonical write-up.
  • gRPC, REST, and GraphQL each have a sweet spot: gRPC for service-to-service calls inside the cluster (HTTP/2, protobuf, streaming, ~7-10x lower serialization cost than JSON); REST/OpenAPI for public APIs and partner integrations; GraphQL for client-driven aggregation across many backend services (mobile apps, complex frontends). Picking the wrong one is one of the most common architecture mistakes.
  • OpenAPI 3.1 is the contract-first specification format. Modern teams generate server stubs, client SDKs, mock servers, and request validators from a single spec. The spec is checked into git as a first-class artifact and reviewed in PRs alongside code.
  • Microservices extraction is justified by team boundaries (Conway's law), scaling profiles (read-heavy vs write-heavy), or data-ownership boundaries — not by 'we want microservices'. Sam Newman's Building Microservices 2nd ed. (samnewman.io/books/building_microservices_2nd_edition) is the canonical reference for when to extract and when to consolidate.
  • Resilience patterns at the API layer: circuit breaker (Hystrix-canonical) to fail fast when a downstream is unhealthy; bulkhead to isolate failure domains; retry with exponential backoff and jitter (AWS Builders' Library is canonical) to avoid thundering herds; timeouts on every call. These are not optional in any production system.
  • Backwards-compat and deprecation discipline: never break existing clients; introduce new fields additively; deprecate via HTTP Sunset header (RFC 8594) plus a documented timeline; instrument deprecated-endpoint usage so you know who's still on the old version before you remove it.

API design principles for tech-company backend teams in 2026

The senior backend engineer's API-design checklist in 2026 is a small set of non-negotiables. Every state-changing endpoint accepts an idempotency key. Every endpoint is documented in an OpenAPI 3.1 spec checked into git. Every response includes structured error codes (not just HTTP status). Every API has an explicit versioning policy and a deprecation policy. Every service-to-service call has a timeout, a retry policy with backoff and jitter, and a circuit breaker.

Beyond the checklist, the cultural pattern is contract-first design: the API spec is written and reviewed before any implementation. The spec generates server stubs, client SDKs, mock servers for frontend teams to develop against, and request/response validators for runtime enforcement. This shifts API-design conversations left, before code is written, when changes are cheap.

The protocol choice — REST, gRPC, or GraphQL — follows the use case:

  • REST + OpenAPI for public APIs, partner integrations, and any surface that needs broad ecosystem support (curl, Postman, generic clients).
  • gRPC for service-to-service calls inside the cluster. HTTP/2 multiplexing, protobuf binary encoding, and streaming primitives make it ~7-10x more efficient than JSON-over-HTTP for high-volume internal traffic. The cost: tooling complexity and weaker browser support.
  • GraphQL for client-driven aggregation across many backend services. The classic use case: a mobile app that needs data from 5 services in a single round-trip and lets the client pick fields. The cost: query-complexity attack surface and N+1 patterns if resolvers aren't designed with DataLoader-style batching.

The wrong choice creates years of pain. A common 2026 anti-pattern: a startup builds a public REST API, adds gRPC for internal services, and then layers GraphQL on top of REST for the mobile client. The result is three protocols, three sets of tooling, three sets of error handling. Pick the smallest set that solves the actual problem.

Idempotency: the single most-cited API-design pattern

Idempotency is the property that retrying a request produces the same effect as a single request. For state-changing endpoints (POST, PATCH, DELETE), the canonical pattern is the client-supplied idempotency key: a unique string the client sends in a header (typically Idempotency-Key), and the server uses to de-duplicate retried writes.

The Stripe pattern (stripe.com/blog/idempotency) is the canonical reference and works as follows:

  1. Client generates a UUID and sends it in the Idempotency-Key header.
  2. Server checks if the key already exists in storage. If yes, return the cached response (status code + body) verbatim.
  3. If no, process the request, store (key, request_hash, response) tuple with a 24-hour TTL, and return the response.
  4. If the same key arrives with a different request body (request_hash mismatch), return 422 — the client is misusing the key.

A minimal Flask handler illustrating the pattern:

from flask import Flask, request, jsonify
import hashlib, json, redis

app = Flask(__name__)
r = redis.Redis()
TTL_SECONDS = 24 * 60 * 60  # 24h, per Stripe

@app.post("/v1/charges")
def create_charge():
    key = request.headers.get("Idempotency-Key")
    if not key:
        return jsonify(error="Idempotency-Key header required"), 400

    body_hash = hashlib.sha256(request.get_data()).hexdigest()
    cache_key = f"idem:{key}"
    cached = r.get(cache_key)

    if cached:
        stored = json.loads(cached)
        if stored["body_hash"] != body_hash:
            # Same key, different body = client bug
            return jsonify(error="Idempotency key reuse with different body"), 422
        return jsonify(stored["response"]), stored["status"]

    # Process the charge — must be transactional with the cache write
    response, status = process_charge(request.get_json())
    r.setex(cache_key, TTL_SECONDS, json.dumps({
        "body_hash": body_hash,
        "response": response,
        "status": status,
    }))
    return jsonify(response), status

What this handler gets right: required header (no silent fallback to non-idempotent behavior); body-hash check so reused keys with mismatched bodies fail loudly; 24-hour TTL matching Stripe's documented window; cached response returned verbatim with original status code.

What this handler does NOT get right (production gaps): the charge processing and the cache write are not transactional — if the process crashes between charging and writing the cache, the next retry will charge again. Stripe handles this with a write-ahead log: write the idempotency key with status='pending' before processing, then update with status='complete' and the response after. The senior+ implementation uses this pattern.

Idempotency keys are required for any endpoint that touches money, inventory, or external state. They are strongly recommended for any endpoint with side effects.

Microservice patterns: when to extract, when to consolidate

The 2026 industry consensus, reflected in Sam Newman's Building Microservices 2nd ed. (samnewman.io/books/building_microservices_2nd_edition), is that microservices are an organizational tool, not a technical default. Extract a service when one of three conditions is true:

  • Team boundary (Conway's law). Two teams can't iterate independently because they share a deployment unit. Splitting the deployment unit unblocks both teams.
  • Scaling profile divergence. One workload is read-heavy (10k QPS, mostly cached) and another is write-heavy (low QPS, transactional). They have incompatible scaling needs and noisy-neighbor patterns.
  • Data-ownership boundary. Two domains have genuinely different data models, consistency requirements, or compliance constraints (e.g., PII vs. non-PII).

If none of these apply, the modular monolith is the right answer. Splitting prematurely buys you distributed-systems problems (partial failures, eventual consistency, distributed transactions) without the corresponding organizational benefit.

Contract-first OpenAPI is how teams coordinate across service boundaries. A canonical payments-endpoint spec snippet:

openapi: 3.1.0
info:
  title: Payments API
  version: 2026-01-15
paths:
  /v1/charges:
    post:
      summary: Create a charge
      parameters:
        - in: header
          name: Idempotency-Key
          required: true
          schema: { type: string, format: uuid }
      requestBody:
        required: true
        content:
          application/json:
            schema: { $ref: '#/components/schemas/ChargeRequest' }
      responses:
        '200': { description: Charge created }
        '422': { description: Validation error }

This spec is checked into git, reviewed in PRs, and used to generate the server stub, the client SDK, a mock server for frontend development, and request validators for runtime enforcement. When the spec changes, every consumer is notified via the SDK regeneration.

Service mesh (Envoy, Istio, Linkerd) is the modern infrastructure for microservices. The mesh moves cross-cutting concerns — mTLS, retries, circuit breakers, observability, traffic shaping — out of application code and into a sidecar proxy. The senior+ bar: you understand what the mesh provides versus what your application code still owns (idempotency, business-level validation, domain error handling).

Resilience patterns: circuit breaker, bulkhead, retry-with-backoff

Every production service-to-service call needs four things: a timeout, a retry policy with exponential backoff and jitter, a circuit breaker, and a bulkhead. None of these are optional. Their absence is how partial outages turn into full outages.

Circuit breaker (Hystrix-canonical). Netflix's Hystrix (github.com/Netflix/Hystrix/wiki) defined the canonical pattern. The breaker has three states: closed (calls pass through), open (calls fail fast without hitting the downstream), half-open (a small number of test calls determine if the downstream has recovered). The breaker opens when the failure rate exceeds a threshold (typically 50% over a 10-second rolling window). Hystrix is in maintenance mode but the pattern is implemented in every modern resilience library (Resilience4j, Polly, gobreaker).

Bulkhead. Isolate failure domains by allocating separate thread pools, connection pools, or concurrency limits per downstream. If service A is slow, calls to A queue up in A's pool; calls to B keep flowing through B's pool. Without bulkheads, a slow downstream eats all your threads and your service hangs entirely.

Retry with exponential backoff and jitter. The AWS Builders' Library (aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter) is the canonical reference. The key insight: synchronized retries from many clients create thundering herds that worsen the outage. Adding jitter (randomization) decorrelates the retries and lets the downstream recover.

A canonical Go implementation of full-jitter exponential backoff:

package retry

import (
    "context"
    "errors"
    "math"
    "math/rand"
    "time"
)

type Config struct {
    MaxAttempts int
    Base        time.Duration // initial delay, e.g. 100ms
    Cap         time.Duration // max delay, e.g. 30s
}

// Do retries fn with full-jitter exponential backoff per AWS Builders' Library.
// sleep = random_between(0, min(cap, base * 2^attempt))
func Do(ctx context.Context, cfg Config, fn func() error) error {
    var err error
    for attempt := 0; attempt < cfg.MaxAttempts; attempt++ {
        if err = fn(); err == nil {
            return nil
        }
        if !isRetryable(err) {
            return err
        }
        exp := float64(cfg.Base) * math.Pow(2, float64(attempt))
        capped := math.Min(exp, float64(cfg.Cap))
        sleep := time.Duration(rand.Int63n(int64(capped)))
        select {
        case <-time.After(sleep):
        case <-ctx.Done():
            return ctx.Err()
        }
    }
    return err
}

func isRetryable(err error) bool {
    var re interface{ Retryable() bool }
    return errors.As(err, &re) && re.Retryable()
}

What this code gets right: full jitter (random between 0 and the capped exponential delay) per the AWS Builders' Library recommendation; context cancellation respected during sleeps; bounded attempts; explicit retryable-error interface so non-retryable errors fail fast.

What this code does NOT include: a circuit breaker (should wrap this call), a per-call timeout (should be set on the underlying client), or correlation-id propagation (needed for distributed tracing). The senior+ implementation composes all four.

The AWS Builders' Library article on avoiding fallback (aws.amazon.com/builders-library/avoiding-fallback-in-distributed-systems) makes the related point: fallback logic itself is a source of complexity and bugs. Often the right behavior is to fail fast and surface the error, not to silently degrade.

Backwards-compatibility and deprecation discipline

The senior+ bar for API changes: never break existing clients without a documented deprecation timeline. Concretely:

  • Additive changes only on existing versions. Add new optional fields. Add new endpoints. Never rename an existing field, never tighten a validation rule, never change a status code.
  • Breaking changes require a new version. URL-versioned APIs introduce /v2/; date-versioned APIs introduce a new date. Both versions run in parallel during the migration window.
  • Deprecate via the HTTP Sunset header (RFC 8594). Set Sunset: Wed, 01 Apr 2026 00:00:00 GMT on responses from deprecated endpoints. Set Deprecation: true. Document the replacement in a Link header.
  • Instrument before you remove. Log every call to a deprecated endpoint with the API key or client identifier. Reach out to remaining users with concrete migration guidance before the sunset date. Pull the analytics dashboard before removing.

Stripe's API versioning case study (stripe.com/blog/api-versioning) describes a date-based scheme where each request is processed against the API version pinned to the calling account (or overridden by a Stripe-Version header). Stripe runs many versions in parallel for years, with translation layers between them. This is the gold standard but is operationally expensive; smaller teams typically use URL versioning with a 12-month deprecation window.

Observability is the other half of deprecation discipline. OpenTelemetry (opentelemetry.io/docs/concepts/observability-primer) is the modern standard for distributed tracing, metrics, and structured logs. Every API call is instrumented with a trace, every endpoint exposes RED metrics (rate, errors, duration), and every log line carries a correlation ID. Without observability, you cannot answer 'who is still calling the deprecated endpoint?' and the deprecation never finishes.

Frequently asked questions

When should I use gRPC instead of REST?
gRPC for service-to-service calls inside the cluster where throughput, latency, and streaming matter. The HTTP/2 + protobuf combination is ~7-10x more efficient than JSON over HTTP/1.1 for high-volume internal traffic. REST for public APIs, partner integrations, or any surface where ecosystem support (curl, Postman, generic clients) matters more than raw throughput. The senior pattern in 2026 is gRPC inside the cluster, REST + OpenAPI at the edge.
What's the right idempotency-key TTL?
Stripe uses 24 hours and that's the canonical default. The TTL needs to outlast the longest plausible client retry window: mobile clients with offline queues, scheduled jobs that retry across days, and humans clicking 'retry' after lunch. Going below 1 hour is risky; going above 7 days creates storage cost and key-collision risk. 24 hours is the right answer for almost every production system.
URL versioning or header versioning?
URL versioning (/v1/, /v2/) is simpler, more cacheable, more debuggable, and the right default for most teams. Header versioning (Stripe's 2020-08-27 Stripe-Version header) lets each customer pin to a specific behavior and lets you ship breaking changes without a URL change, but requires significant tooling investment to run many versions in parallel. Pick URL versioning unless you're operating at Stripe-scale platform integrations.
When should I extract a microservice from my monolith?
When team boundary, scaling profile, or data-ownership boundary justifies the operational tax. Concretely: two teams blocked on the same deployment unit; one workload that needs to scale independently; a domain that needs different consistency or compliance properties. Without one of these conditions, the modular monolith is the right answer. Extracting prematurely buys you distributed-systems problems without the corresponding organizational benefit. Sam Newman's Building Microservices 2nd ed. is the canonical reference.
What does a service mesh actually give me?
A service mesh (Envoy, Istio, Linkerd) moves mTLS, retries, circuit breakers, traffic shaping, and observability out of application code and into a sidecar proxy. You get those concerns implemented once, consistently, across every service, in every language. The cost: an extra hop per call (~0.5-2ms), a non-trivial control plane to operate, and a debugging surface that's harder to reason about than in-process libraries. Adopt it when you have 10+ services across 2+ languages.
How do I implement a circuit breaker in 2026?
Use a battle-tested library: Resilience4j (Java/Kotlin), Polly (.NET), gobreaker or sony/gobreaker (Go), pybreaker (Python). The pattern: count failures over a rolling window, open the breaker when failure rate exceeds a threshold (typically 50% over 10s), reject calls fast while open (don't hit the downstream), test recovery via a half-open state. Hystrix is in maintenance mode but the pattern is identical across all the modern libraries.
What's full jitter and why does it matter?
Full jitter is the AWS-recommended retry-backoff strategy: sleep a random duration between 0 and min(cap, base * 2^attempt). Without jitter, all retrying clients wake up simultaneously and create a thundering herd that worsens the outage. With full jitter, retries are decorrelated across clients and the downstream gets a chance to recover. The AWS Builders' Library article on timeouts and backoff is the canonical reference; every retry library should implement this by default.
Should every state-changing endpoint require an idempotency key?
For any endpoint that touches money, inventory, external state, or has user-visible side effects: yes, required. For endpoints that are naturally idempotent (PUT to a known resource, DELETE), the key is optional but still recommended for retry safety. The senior+ bar: you have a default policy ('all POST/PATCH require Idempotency-Key') and explicit exceptions, not the other way around.
How do I deprecate an API endpoint without breaking clients?
Five steps: (1) instrument the endpoint to log every call with a client identifier; (2) set the Sunset header (RFC 8594) and Deprecation: true on responses; (3) document the replacement and migration guide; (4) reach out directly to remaining heavy users; (5) remove only after the sunset date with zero recent calls. The minimum migration window is 6 months for partner APIs, 12 months for public APIs. Removing earlier is hostile to your users; removing without instrumentation is reckless.
When should I use GraphQL?
When you have many backend services and a client (typically mobile or a complex web app) that needs to aggregate data from several of them in a single round-trip with client-controlled field selection. GraphQL pays off when you'd otherwise build a bespoke BFF (Backend-For-Frontend). It does NOT pay off as a general-purpose API protocol — query-complexity attacks, N+1 patterns, and cache-invalidation complexity are real costs. Use REST or gRPC by default; reach for GraphQL when the aggregation use case is real.

Sources

  1. Stripe Engineering — Designing robust and predictable APIs with idempotency. Canonical reference for the idempotency-key pattern.
  2. Stripe Engineering — APIs as infrastructure: future-proofing with versioning. Canonical case study on date-based API versioning.
  3. Sam Newman — Building Microservices, 2nd Edition (O'Reilly). Canonical reference on when to extract services and when to consolidate.
  4. AWS Builders' Library — Timeouts, retries, and backoff with jitter. Canonical reference for full-jitter exponential backoff.
  5. AWS Builders' Library — Avoiding fallback in distributed systems. Canonical reference on why fallback often makes failures worse.
  6. OpenTelemetry — Observability primer. Canonical primer on traces, metrics, and structured logs for distributed systems.
  7. Netflix Hystrix Wiki. Canonical reference for the circuit-breaker pattern (library is in maintenance, pattern is universal).

About the author. Blake Crosley founded ResumeGeni and writes about backend engineering, hiring technology, and ATS optimization. More writing at blakecrosley.com.