AI Tools in the Software Engineering Workflow (2026)
In short
AI-augmented development is interview table-stakes for SWEs at most large tech companies in 2026, and the differentiator is no longer 'do you use it' but 'do you use it well.' This page shows three real workflow captures: a Cursor multi-file refactor, a Claude Code plan-mode session for non-trivial work, and a test-generation prompt with the exact input and output. Sources include Anthropic's published Claude Code docs (docs.claude.com/en/docs/claude-code), Cursor's documentation (cursor.com/docs), and Will Larson's StaffEng essays on AI in engineering practice.
Key takeaways
- Cursor's Cmd+K (inline edit) and @-context (file/symbol/folder references) are the primary primitives — knowing them by feature name is the senior signal (cursor.com/docs/get-started/migrate-from-vs-code).
- Claude Code's plan mode (Shift+Tab to toggle) is the right tool for non-trivial work where you want a written plan before any file edits — published in Anthropic's Claude Code docs (docs.claude.com/en/docs/claude-code/overview).
- Test generation works best with a specific contract: describe the function, name 3-5 edge cases yourself, ask the AI to extend with edge cases you missed, then review every assertion before committing.
- AI-generated code has a specific failure mode interviewers probe: confident wrong code that passes the happy path. The senior practice is to verify with property-based tests or fuzz testing, not just unit tests.
- Refusing AI tools is a screen-killer at most modern tech companies in 2026 — Stripe, Anthropic, Vercel, Cursor, and Linear explicitly weight AI fluency. Anthropic's own postings note they 'use Claude in our daily workflow'.
- The wrong way to use AI: 'write me a function that does X' with no spec. The right way: write the test first, paste it with the function signature, ask AI to make the test pass.
Cursor multi-file refactor: a worked example
Cursor's killer feature is multi-file context plus inline edit. The pattern: @-reference the files that matter, then describe the change. Specific commands and what they do (per cursor.com/docs):
- Cmd+K — inline edit on current selection. Opens a prompt above the selection; output replaces it.
- Cmd+L — open chat sidebar. Ask questions, attach files via @.
- Cmd+I — Composer / agent mode. AI can edit multiple files autonomously.
- @filename — pull a file into context. Use for stable dependencies.
- @symbol — pull a specific function or class.
- @folder — pull every file in a folder (token-expensive; use sparingly).
- @web — perform a web search and add results to context.
Worked example: rename a function across a Python codebase.
Task: rename get_user_by_id to fetch_user across 47 files in a FastAPI codebase, updating tests and docstrings.
Wrong way (what junior engineers do): 'rename get_user_by_id to fetch_user everywhere' to chat. Cursor sees one file at a time, may miss test files or import sites, returns a half-completed change.
Right way (what senior engineers do):
- Open Cursor's Composer (Cmd+I).
- Toggle to Agent mode.
- Prompt:
Rename the function `get_user_by_id` to `fetch_user`. Update all call sites, all tests under tests/, and update the docstring in app/services/user_service.py to reflect the new name. List the files you'll change before editing. - Review the agent's plan (it lists files; you confirm).
- Agent edits files in sequence; review each diff in the Cursor diff viewer (Cmd+Shift+D).
- Run tests:
pytest tests/ -v. Fix any miss.
The senior+ trick: ask for a plan first. Adding 'list the files you'll change before editing' to the prompt forces a checkpoint. You catch mistakes before they hit disk.
The other senior+ trick: use git to constrain blast radius. Make a commit before invoking the agent; if the agent goes wrong, git reset --hard is one command away. Cursor's autosave can compound a bad agent run if you're not branched.
Claude Code plan-mode session: when to use it and a real transcript
Claude Code (docs.claude.com/en/docs/claude-code/overview) is Anthropic's terminal-native CLI. It's structurally different from Cursor: instead of an IDE plugin, it runs as a shell session that can read, edit, and execute code with explicit permission gates.
Plan mode is the senior-bar feature. Toggle with Shift+Tab from the prompt; Claude writes a plan, you approve it, then it executes. The plan is the artifact that distinguishes good AI work from bad: it shows what the model thinks the work entails, lets you correct misunderstandings before any code changes, and produces a written record of what shipped.
Real plan-mode session example: 'Add rate limiting to the FastAPI feedback endpoint.'
# User prompt (with Shift+Tab pressed to enter plan mode):
> Add Redis-backed rate limiting to /api/feedback. Limit anonymous users
> to 5 requests per hour per IP. Return 429 with a Retry-After header on
> rejection. Use the existing Redis client at app/services/redis_client.py.
# Claude's plan (excerpt):
Plan:
1. Read app/services/redis_client.py to confirm the client interface.
2. Read app/routers/feedback.py to see the current endpoint signature.
3. Read app/middleware/rate_limit.py to check whether existing rate-limit
middleware can be reused, or if a per-endpoint decorator is preferred.
4. Implement: a FastAPI dependency function `rate_limit_anonymous(request)`
that uses Redis INCR with a 1-hour TTL keyed on client IP. Returns
HTTPException(429, retry_after=<seconds>) when exceeded.
5. Add the dependency to the /api/feedback endpoint via Depends().
6. Write tests in tests/test_feedback_rate_limit.py covering:
- first request: success
- 5th request: success
- 6th request: 429 with Retry-After
- request after window expires: success
7. Run pytest to verify.
Files I will modify:
- app/routers/feedback.py (add dependency import + Depends)
- app/dependencies/rate_limit.py (new file)
- tests/test_feedback_rate_limit.py (new file)
Approve? [y/N/edit]You review the plan, approve, and Claude executes. If step 3 reveals existing middleware that already handles this, the plan adapts (Claude prompts before continuing).
When plan mode is the wrong tool:
- One-line edits where the plan-write overhead exceeds the edit time.
- Exploration / 'what does this code do' questions — use the regular ask mode.
- Bug-hunt sessions where you don't know what the change is until you find the bug.
When plan mode is exactly the right tool:
- Multi-file changes (refactors, feature additions touching 3+ files).
- Anything that touches production-critical code (auth, payments, data migrations).
- When you want a written record of what was done — the plan transcript is documentation.
The /init feature creates a CLAUDE.md with project context (commands, architecture, conventions). Run it once per repo; commit the file. Subsequent sessions read it automatically. Reference: docs.claude.com/en/docs/claude-code/memory.
Test generation: the prompt pattern that actually works
'Write me unit tests for this function' produces tests that pass the happy path and miss every interesting edge case. The pattern that produces useful tests: write your spec, generate, then review.
Worked example: testing a discount calculator.
# The function under test
def apply_discount(price: float, discount_pct: float) -> float:
"""Apply a percentage discount to a price.
price: original price in dollars (must be > 0)
discount_pct: percentage off (0-100, inclusive)
Returns: discounted price, rounded to 2 decimal places.
Raises ValueError on invalid inputs.
"""
if price <= 0:
raise ValueError("price must be positive")
if not 0 <= discount_pct <= 100:
raise ValueError("discount_pct must be between 0 and 100")
return round(price * (1 - discount_pct / 100), 2)The wrong prompt: 'Write tests for apply_discount.'
The right prompt (the senior pattern):
Write pytest tests for apply_discount. Cover:
- Happy path: $100 with 20% off = $80.
- Boundary: 0% discount returns the original price.
- Boundary: 100% discount returns $0.
- Invalid: negative price raises ValueError.
- Invalid: negative discount raises ValueError.
- Invalid: discount > 100 raises ValueError.
- Floating-point: $0.01 with 50% off rounds to $0.00 or $0.01? Verify the
round-half-to-even behavior of round().
- Add 3 more edge cases I haven't named that you think matter.
Return the test file with one test function per case, descriptively named.Claude's output (excerpt):
def test_happy_path_simple_discount():
assert apply_discount(100, 20) == 80.0
def test_zero_discount_returns_original():
assert apply_discount(100, 0) == 100.0
def test_full_discount_returns_zero():
assert apply_discount(50, 100) == 0.0
def test_negative_price_raises():
with pytest.raises(ValueError, match="positive"):
apply_discount(-1, 10)
def test_negative_discount_raises():
with pytest.raises(ValueError, match="between"):
apply_discount(100, -5)
def test_discount_over_100_raises():
with pytest.raises(ValueError, match="between"):
apply_discount(100, 101)
def test_floating_point_rounding():
# round() in Python 3 uses banker's rounding
assert apply_discount(0.01, 50) == 0.0 # 0.005 rounds to 0
assert apply_discount(0.03, 50) == 0.02 # 0.015 rounds to 0.02
# Edge cases I added:
def test_very_large_price():
assert apply_discount(1_000_000, 25) == 750_000.0
def test_zero_price_raises():
with pytest.raises(ValueError, match="positive"):
apply_discount(0, 10)
def test_string_inputs_raise_typeerror():
with pytest.raises(TypeError):
apply_discount("100", 10)The review step (the senior bar):
- Verify each assertion is right. Did the AI compute 100 * 0.8 correctly? (Yes.) Did it understand banker's rounding? (Verify with Python REPL.)
- Add property-based tests with Hypothesis for the invariants the AI didn't articulate: discounted price <= original, discounted price >= 0, discount of 0 returns original.
- Run mutation testing (mutmut, cosmic-ray) on critical paths — AI-generated tests often pass mutation testing for happy paths but miss boundary mutations.
The pattern: AI extends your test suite faster than you can write it; you remain the verifier of correctness. Never commit AI-generated tests without reading every assertion.
AI for code review and pair programming
The two workflows where AI integrates most cleanly into senior+ practice.
Pre-review pass (catch obvious issues before human review):
# In Claude Code or Cursor:
> Review the diff in the current branch (git diff main..HEAD). Look for:
> - Unhandled error cases (especially around external service calls)
> - Security issues (SQL injection, XSS, missing auth checks)
> - Test coverage gaps
> - Inconsistencies with the existing codebase patterns
> - Hardcoded values that should be config
> Format as a list of file:line references with a one-sentence finding each.The output is a checklist. You triage: address the real issues; ignore the false positives. The bar to publish a PR is: AI pre-review run, all real issues addressed, then human review.
Pair programming on greenfield: the right pattern is 'AI as junior collaborator, human as senior'. AI proposes; human evaluates and corrects. The wrong pattern: AI as senior, human as 'rubber stamp'. The latter ships bugs.
# Greenfield pair pattern:
1. Human: 'I'm building a feature that does X. Brainstorm 3 approaches.'
2. AI: lists 3 approaches with trade-offs.
3. Human: 'Approach 2 is right; the trade-off you missed is Y. Implement.'
4. AI: writes implementation.
5. Human: reviews; calls out concerns; AI iterates.The single highest-leverage AI workflow at senior+: using AI to read unfamiliar code. Open a 5000-line legacy file; ask 'walk me through the data flow when a user submits the form'. The AI's narration is faster than reading line-by-line and 80% accurate; you fill in the 20% by reading the parts the AI got wrong. This compresses 'understand legacy code' from hours to minutes.
What the senior+ FAANG bar looks like for AI fluency
From job postings at AI-forward companies (Anthropic, Cursor, Vercel, Stripe) and Will Larson's StaffEng essays (staffeng.com), the bar in 2026 looks like:
- You name specific tools by feature, not brand. 'Cursor's @-symbol context for grabbing related types' beats 'I use Cursor.'
- You can articulate where AI is wrong. 'AI-generated code passes the happy path but misses edge cases unless I explicitly enumerate them' is a senior signal.
- You have shipped AI-augmented code that you can describe. 'I refactored the auth middleware using Cursor's agent mode; here's what I had to fix in review.'
- You've written prompts that codify your team's patterns. A CLAUDE.md or .cursorrules file in your repo, with the conventions you want enforced.
- You have an opinion about what AI is bad at. 'Schema migrations — I write those by hand because AI hallucinates indexes.'
What kills the senior+ signal:
- 'I prefer to write code myself.' Reads as inflexibility. The right framing is 'I use AI for X but write Y by hand because Z.'
- 'Cursor / Claude Code' as keywords on a resume with no specific feature use. The interviewer will ask 'what's your favorite Cursor feature?' — and 'I like it' is the wrong answer.
- Confidently wrong claims about AI. 'Claude is better at Python than Go' — defend that with a specific case where you tested. Otherwise, it reads as marketing repetition.
Reference reading:
- Will Larson, 'Engineering Strategy in the Age of AI' (lethain.com) — staff-engineer-level framing.
- Anthropic, 'Claude Code best practices' (anthropic.com/engineering/claude-code-best-practices) — published 2025, regularly updated.
- Simon Willison's blog (simonwillison.net) — practitioner-level posts on prompt patterns and tool integration; the canonical practitioner voice on AI engineering tooling.
- Andrej Karpathy's talks on the AI engineer role (youtube.com/@AndrejKarpathy) — broader framing.
Frequently asked questions
- Is Cursor or Claude Code the better tool to invest in?
- Different tools for different jobs. Cursor is an IDE; better for visual diff review, cursor-position-aware edits, and click-to-jump navigation. Claude Code is a terminal CLI; better for shell-driven workflows, headless automation, and remote-server work. Strong senior+ engineers use both: Cursor for daily IDE work, Claude Code for batch operations, deployment scripts, and tasks where the IDE overhead doesn't pay off. Anthropic's own engineers use Claude Code for most workflows per their published best-practices doc.
- How do I avoid AI shipping bugs that pass tests?
- Three layers. (1) Write the test first; have AI implement to pass. The test is your spec; you wrote it; you trust it. (2) Use property-based testing (Hypothesis in Python, fast-check in JS) for invariants — these catch edge cases unit tests miss. (3) Mutation testing (mutmut, cosmic-ray) on critical paths — verifies your tests actually fail when the implementation is wrong. The litmus test: if you let mutmut delete a line of your AI-generated code and the tests still pass, the tests are weak.
- Should I use AI for production code, just internal tools, or both?
- Both, but with different review bars. Internal tools / scripts: AI ships with light review (you catch bugs at use time). Production code: AI ships with the same review bar as human-written code — multi-eye human review, full test coverage, mutation testing on critical paths, security review for anything touching auth or data access. The AI-vs-human distinction is irrelevant at the code-review gate; the safety bar is the same.
- How do I prepare for AI-related questions at FAANG interviews?
- Have one or two stories where AI saved meaningful time, and one story where AI led you wrong and you had to debug. Concrete numbers help: 'Cursor's agent refactored a 12-file rename in 4 minutes; I spent 20 minutes verifying tests' beats 'AI is fast.' For the wrong-direction story: 'AI generated a Postgres migration that was wrong — the index it suggested would have caused a 30-minute lock; I caught it in review by checking the EXPLAIN.' The wrong-direction story is what passes the senior bar.
- Is using AI in coding interviews allowed?
- Increasingly yes; sometimes mandated. Stripe and Anthropic include AI-tool fluency rounds. Most other FAANG-tier interviews still test pure coding without AI in the algorithm round, then test AI fluency separately at senior+. Verify with the recruiter for the specific round. The wrong move: assume AI is banned and don't ask. The other wrong move: use AI in a round where the interviewer didn't enable it.
- What's the right amount of context to give an AI prompt?
- Enough to disambiguate, not so much that the AI loses the thread. Empirical pattern: 50-200 lines of relevant code, plus 2-3 sentences of intent, plus 1-2 examples of the desired output style if it's a style-sensitive task. Beyond ~10k tokens of context, model performance starts to degrade on most tasks. Cursor's @-folder is dangerous because it can blow past optimal context size; @-file or @-symbol is usually right.
- How do AI tools handle proprietary code and IP concerns?
- Anthropic, OpenAI, and Cursor all offer enterprise tiers with no-training guarantees on customer data (anthropic.com/legal/data-usage; cursor.com/security). Default consumer tiers vary. For employer-owned code: use the enterprise tier your company has approved; never paste production secrets, customer data, or confidential business logic into a consumer-tier tool. Check your company's AI tool policy before using AI on company code; this is a real fireable offense at some companies in 2026.
- Are there workflows where AI hurts more than it helps?
- Yes. (1) Schema migrations on production databases — AI hallucinates index types and can suggest schema changes that lock for hours. (2) Performance-critical hot paths in C++/Rust — AI generates working but slow code; the senior practice is to benchmark before accepting. (3) Cryptography — never write your own crypto, never let AI write it either. (4) Code in a domain you don't understand — if you can't review the output, you shouldn't ship it. The general rule: AI accelerates work in domains where you can verify; it amplifies mistakes in domains where you can't.
Sources
- Anthropic — Claude Code overview (commands, plan mode, MCP integration).
- Anthropic — 'Claude Code Best Practices' (engineering blog, regularly updated).
- Cursor — official docs (commands, agent mode, multi-file context).
- Will Larson — StaffEng essays (the canonical staff-engineer practitioner voice).
- Simon Willison's blog — practitioner-level prompt patterns and tool integration.
- Anthropic — data usage policy (enterprise no-training guarantees).
- GitHub Copilot — official documentation.
About the author. Blake Crosley founded ResumeGeni and writes about product design, hiring technology, and ATS optimization. More writing at blakecrosley.com.