AI Tools in the Security Engineering Workflow in 2026: Cursor, Claude Code, LLM-Assisted Code Review, AI Threat Hunting
In short
AI tools have become a senior Security Engineer force-multiplier in 2026: Cursor and Claude Code for runbook authoring, detection-rule scaffolding, and threat-model drafting; LLM-assisted code review for OWASP Top 10 spotting; natural-language SIEM querying through Datadog Bits AI, Honeycomb Query Assistant, and Microsoft Security Copilot. The senior craft anchors on NIST AI RMF, OWASP LLM Top 10, MITRE ATLAS, and Anthropic's Responsible Scaling Policy. The bar is force-multiplier judgment, not vibe-coded security tooling.
Key takeaways
- AI tools meaningfully changed Security Engineering between 2023 and 2026; Cursor, Claude Code, LLM-assisted code review, natural-language SIEM querying, AI-augmented incident response; and the 2026 senior bar is using them as a force-multiplier on human SecEng judgment, not as a replacement for it.
- The canonical 2026 governance frameworks are the NIST AI Risk Management Framework (AI RMF 1.0, csrc.nist.gov/Projects/ai-risk-management) with its Govern / Map / Measure / Manage functions, the OWASP Top 10 for Large Language Model Applications (owasp.org/www-project-top-10-for-large-language-model-applications), and MITRE ATLAS as the adversarial-threat-space catalog for AI systems.
- IDE-integrated AI is the highest-apply Security Engineering use case in 2026; Cursor (cursor.com) and Claude Code (docs.claude.com/en/docs/claude-code) for runbook authoring, detection-rule scaffolding, threat-modeling document drafting, and postmortem editing. Senior SecEng output is faster and more thorough; junior output is faster and less correct without supervisor review.
- LLM-assisted code review for security is a force-multiplier on human PR review, not a replacement. Claude, GPT-4, and Gemini surface OWASP Top 10 issue candidates that humans triage; GitHub Copilot security review and CodeQL Copilot integrations are the canonical 2026 IDE references. Reviewing the AI's review is the senior workflow.
- Natural-language SIEM and observability querying is the second-largest 2026 SecEng AI surface; Datadog Bits AI for security ops (docs.datadoghq.com), Honeycomb Query Assistant (honeycomb.io), Microsoft Security Copilot. Threat hunters now write English questions and read SQL / KQL responses; the senior signal is reading the generated query critically, not trusting it.
- Threat modeling for AI systems themselves is now table-stakes SecEng work. The OWASP LLM Top 10 catalogs prompt injection (LLM01), insecure output handling (LLM02), training-data poisoning (LLM03), sensitive-information disclosure (LLM06), insecure plugin design (LLM07), and excessive agency (LLM08). MITRE ATLAS gives the adversarial-tradecraft mapping; senior SecEng cite both fluently.
- Frontier-AI-lab security roles at Anthropic, OpenAI, Google DeepMind, and the Microsoft Security Copilot team are some of the highest-apply 2026 security roles, with compensation belonging on levels.fyi/t/security-engineer per-company filters. Anthropic's Responsible Scaling Policy (anthropic.com/responsible-scaling) is the canonical public commitment to evaluate-before-deploy at frontier-AI scale.
What AI in the Security Engineering workflow actually means in 2026: five surfaces, force-multiplier framing
Between 2023 and 2026 AI tooling moved from experimental novelty
to part of the senior Security Engineer's daily craft
. The 2026 reality is not AI replaces Security Engineers
; it is Security Engineers who use AI well accelerate routine tasks meaningfully (specific multipliers depend on the task; benchmark on yours) and write better deliverables, and Security Engineers who refuse to use AI at all are operating in a 2022 vocabulary
. The five concrete surfaces where AI tooling matters in 2026 SecEng workflows:
- IDE-integrated AI for SecEng deliverables. Cursor and Claude Code are the 2026 reference set. Senior SecEng use them for runbook authoring, detection-rule scaffolding (Sigma, KQL, Splunk SPL, YARA, Suricata), threat-modeling document drafting (STRIDE templates, attack trees, ASVS-mapped checklists), and postmortem editing. The AI handles the structural-prose work; the human handles the domain judgment, the trust-boundary calls, and the residual-risk decisions.
- LLM-assisted code review for security. Claude, GPT-4, and Gemini-class models reviewing PRs as a force-multiplier on human review; surfacing candidate OWASP Top 10 findings that humans then triage. GitHub Copilot security-review features and CodeQL-Copilot integrations are the canonical IDE references. The senior pattern is reviewing the AI's review; treating it as a tireless junior that surfaces candidates, not as a final-decision oracle.
- Natural-language SIEM and threat hunting. Datadog Bits AI for security ops, Honeycomb Query Assistant, and Microsoft Security Copilot let analysts write English questions (
show me unusual S3 GetObject patterns from non-corporate IPs in the last 24 hours
) and get back a SIEM query plus initial result interpretation. The 2026 senior signal is reading the generated query critically before trusting the answer; the model can hallucinate field names, miss WHERE clauses, or use the wrong time window. - AI-augmented incident response. Faster log analysis (LLM-summarized timelines from raw logs), faster postmortem drafting (the model produces a structured five-whys skeleton from the incident channel transcript), faster comms-template generation during major incidents (status-page updates, customer notifications, executive summaries). The discipline is not
let the model run the incident
; it islet the model handle the prose-shaping work so the human can focus on the technical containment
. - Threat modeling for AI systems themselves. The new SecEng surface that did not exist in 2022. AI-integrated applications have new vulnerability classes; prompt injection, training-data poisoning, model-output handling at the application boundary, plugin-and-tool-call abuse; catalogued in the OWASP LLM Top 10. Senior SecEng must threat-model AI systems with the same rigor they apply to traditional web applications.
Three properties separate the 2026 AI-in-SecEng craft from the 2023 hype cycle:
- The framing is force-multiplier, not replacement. Cursor and Claude Code drafting a Sigma rule from an English description does not replace the SecEng who chose the detection logic, validated it against the threat-model, and tuned the false-positive rate against the SOC's alert-fatigue budget. The human did the judgment work; the model did the typing work. The 2026 senior pattern is making this distinction explicitly in calibration cycles and in artifact attribution.
- The governance frameworks are real and enforceable. The NIST AI Risk Management Framework (AI RMF 1.0) is not vendor marketing; it is the canonical 2026 risk taxonomy with its Govern / Map / Measure / Manage functions, used by US federal contractors and increasingly by FAANG-tier AI deployments. Anthropic's Responsible Scaling Policy is the canonical public evaluate-before-deploy commitment at frontier-AI scale. SecEng deploying AI systems must speak both vocabularies.
- The failure modes are predictable. Hallucinated APIs in AI-generated security code, missing input validation in vibe-coded detection rules, sensitive data leakage through unsanctioned model usage, over-reliance on AI for novel threat-modeling (LLMs are strong at known patterns, weak at novel architectures). Senior SecEng catalog these failure modes and design workflows that prevent them; not workflows that hope they do not occur.
Cursor and Claude Code as the 2026 SecEng IDE: runbooks, detection rules, threat models, postmortems
The single highest-apply AI surface in 2026 Security Engineering is the IDE-integrated coding assistant; Cursor as the AI-native fork of VS Code, Claude Code as Anthropic's terminal-native and IDE-integrated agentic coding interface. SecEng work is largely structured-document work; runbooks, detection rules, threat models, postmortems, ASVS checklists; and structured-document work is what these tools accelerate hardest.
- Runbook authoring. Senior SecEng use Cursor and Claude Code to scaffold incident-response runbooks from a one-paragraph description and an attached threat model. The AI produces a draft with the standard sections (detection signals, escalation chain, containment steps, eradication steps, recovery, post-incident actions), references the org's existing ATT&CK-mapped detection set, and surfaces the open trust-boundary questions. The human then audits the runbook against reality, fills in the org-specific routing, and tunes the language. Time-to-first-draft drops from from hours to a meaningfully shorter window (specific savings vary by task); quality of the final artifact stays at senior bar because the human did the judgment work.
- Detection-rule scaffolding. Sigma, KQL, Splunk SPL, YARA, Suricata, Snort. The AI takes
I want to detect Kerberoasting where a non-admin account requests TGS tickets for SPNs at high volume from a single workstation in a 30-minute window
and produces a Sigma rule draft mapped to MITRE ATT&CK T1558.003. The human reviews field names against the org's log schema, validates the time-window and threshold against historical baselines, and tunes false-positive rates against the SOC's alert-fatigue budget. The model writes the syntax; the human owns the semantics and the operational tuning. - Threat-modeling document drafting. STRIDE templates filled in from a system architecture description; PASTA stage-by-stage analysis structure; attack-tree skeletons from a stated adversary objective; ASVS-mapped verification checklists for a given application class. The AI handles the boilerplate; the human handles the trust-boundary diagram, the residual-risk acceptance, and the engineering-trade-off conversations. The 2026 senior pattern is using AI to compress the document-writing time so more of the SecEng's calendar goes to the live whiteboard threat-modeling sessions where judgment happens.
- Postmortem editing. Raw incident-channel transcripts, on-call timeline notes, log snippets, and Slack screenshots become a structured five-whys postmortem draft with timeline, contributing factors, and action items. Honest postmortems are calendar-blocked work that on-call SecEng often defer; AI-assisted drafting closes the gap between
incident resolved
andpostmortem published
from from days to a meaningfully shorter window (specific savings vary by task). - Auditing the AI's output. The single load-bearing senior-versus-junior signal in 2026 IDE-AI workflow is the audit step. Senior SecEng read every line of AI-generated code, every clause of AI-generated detection logic, every claim in an AI-generated runbook, and challenge each one. Junior SecEng paste the AI output into the codebase or runbook directory and ship. The junior workflow ships exploitable code, false-positive-prone detection rules, and runbooks that reference fields that do not exist.
Two anti-patterns are common enough in 2026 to be recognized by senior calibration committees:
- Vibe-coded security tooling. The SecEng asks Claude Code or Cursor to
write me a credential-scanning tool
, the AI emits 200 lines of plausible Python with hallucinated regex patterns, undefined exception handling, and a hard-coded list of credential prefixes that misses half the real cases. The SecEng commits it. Six weeks later a real credential leak slips through because the regex did not match the new GitHub-token format. The anti-pattern is recognizable by lack of test coverage on AI-generated security code, lack of explicit threat-model documentation for the AI-generated artifact, and lack of human author-of-record commitment to the code's correctness. - AI as threat-model oracle. Asking Cursor or Claude Code to
threat-model this design
and accepting the output without doing the live whiteboard session. LLMs are strong at recognizing known patterns (OAuth flows, CSRF defenses, common authorization mistakes) and weak at novel architectures (your specific multi-tenant boundary, your specific data-flow across business units, your specific compliance constraints). The model produces a generic threat list; the actual high-impact threats live in the org-specific design choices the model has no context on. Senior SecEng use AI to scaffold threat models, not to author them.
LLM-assisted code review and natural-language SIEM querying: force-multiplier patterns
Two 2026 SecEng surfaces sit one layer above IDE-integrated coding: PR-time security code review and threat-hunting / SIEM querying. Both are dominated by AI-assist patterns that compress analyst time without removing analyst judgment.
- LLM-assisted code review for security. Modern frontier models (Claude, GPT-4, Gemini) can read a PR diff and surface candidate OWASP Top 10 findings; A01 Broken Access Control where an authorization check is missing, A03 Injection where a query is constructed by string concatenation, A07 Identification and Authentication Failures where a session-management primitive is implemented unsafely. The 2026 reference IDE features are GitHub Copilot security-review and CodeQL-Copilot integrations.
- The senior pattern: review the AI's review. The model is a tireless junior reviewer that surfaces candidates. The human SecEng triages; confirming true positives, dismissing hallucinated APIs (the model invented a function name that does not exist), validating that flagged authorization gaps are real (the model can miss that a required check happens upstream in middleware). The signal of senior judgment is the human review of the AI review, not the volume of AI-surfaced findings.
- Failure mode: rubber-stamping AI security review. The anti-pattern is
the AI did not flag anything in this PR, so we are good
. Modern LLMs can miss design-flaw issues (A04 Insecure Design) entirely because they require understanding the trust model across components, not the local code semantics. They can miss novel injection primitives. They can miss multi-step authorization-bypass chains where each step looks innocuous in isolation. AI security review supplements human review; it does not replace it. - Natural-language SIEM and threat hunting. Datadog Bits AI for security ops, Honeycomb Query Assistant, Microsoft Security Copilot, Splunk AI Assistant. Threat hunters write English (
find anomalous AssumeRole patterns where the source IP changed between authentication and the role-assumption call
) and the model returns the SQL / KQL / Splunk SPL query plus an initial summary of the result rows. The 2026 senior workflow:- Read the generated query critically; does it select the right time window, does it use the right field name, does it correctly correlate across two log sources, is the WHERE clause restrictive enough to surface the actual hypothesis.
- Run the query and read the raw rows. Do not trust the model's interpretive summary; look at the data.
- Iterate on the query in plain English (
narrow that to non-corporate IPs only
), watching the model adjust the WHERE clause. - Capture the final query and result in the threat-hunt journal as the artifact, with explicit ATT&CK technique mapping.
- The senior signal: the threat-hunt journal artifact. Junior threat hunters use natural-language SIEM tools and produce no durable artifact; they ask questions, get answers, move on. Senior threat hunters use the same tools and produce a journal of formalized queries, ATT&CK-mapped findings, and detection-engineering handoffs to the SOC. The AI accelerates the velocity; the artifact discipline is what makes the velocity matter.
The 2026 SecEng calibration view is consistent: AI tools improve top-of-funnel signal generation and bottom-of-funnel artifact production, but the middle; judgment, prioritization, residual-risk decisions, threat-model coherence; remains the senior human craft. The engineers who clear the senior bar are the ones who explicitly architect their workflow to keep the judgment work in human hands and route the prose-shaping and pattern-matching work to AI.
Threat modeling AI systems: OWASP LLM Top 10, MITRE ATLAS, NIST AI RMF, Anthropic RSP
The other half of AI in SecEng workflow
is the inverse problem: SecEng must threat-model the AI systems their own org is shipping. AI-integrated applications have new vulnerability classes that did not exist in pre-2023 AppSec curricula, and the canonical 2026 reference set is concrete.
- OWASP Top 10 for Large Language Model Applications. The OWASP LLM Top 10 is the canonical 2026 vulnerability catalog for LLM-integrated systems. The senior SecEng vocabulary:
- LLM01 Prompt Injection. Direct prompt injection (the user supplies adversarial input) and indirect prompt injection (the model retrieves attacker-controlled content from a tool, web page, or document). LLM01 is the most-cited LLM vulnerability class in 2026 because tool-using and agentic LLM applications turn prompt-injection into a remote-code-execution-adjacent primitive.
- LLM02 Insecure Output Handling. The application trusts model output as if it were structured data, executing it as code or rendering it as HTML without sanitization. The bridge from LLM02 to traditional A03 Injection in the OWASP Top 10.
- LLM03 Training Data Poisoning. Adversarial influence on the training corpus. Most relevant for fine-tuning workflows and for models that learn from production user data.
- LLM06 Sensitive Information Disclosure. The model reveals training data, system-prompt content, or upstream-tool outputs to the user. The mitigations include output filtering, guardrail layers, and design-time data-flow analysis.
- LLM07 Insecure Plugin Design. Tool-calling and plugin systems where the model can invoke arbitrary external actions without sufficient authorization. The 2026 strong pattern is least-privilege per tool, allowlist-based capability granting, and explicit human-in-the-loop gating for high-impact actions.
- LLM08 Excessive Agency. Agent-style applications granted authorities (file write, email send, code execution, payment processing) that exceed what the use case actually requires. Mitigations include capability-scoping, dry-run modes, confirmation gates, and audit-logging of every tool call.
- LLM09 Overreliance. Users trusting model output as authoritative when it is not. Both a UX and a security concern.
- LLM10 Model Theft. Model-weight exfiltration and model-inversion attacks.
- MITRE ATLAS. MITRE ATLAS (Adversarial Threat Space for Artificial-Intelligence Systems) is the ATT&CK-style adversary-tradecraft catalog for AI systems. ATLAS techniques cover reconnaissance against ML systems, model evasion, model extraction, training-data poisoning, and inference-attack chains. The 2026 senior pattern: SecEng working on AI applications cite ATLAS technique IDs in threat models the same way SecEng working on traditional applications cite ATT&CK technique IDs.
- NIST AI Risk Management Framework. The NIST AI RMF 1.0 structures AI risk management around four functions; Govern (organizational risk management), Map (context and risk identification), Measure (assessment and metrics), Manage (response and prioritization). Federal contractors and increasingly FAANG-tier AI deployments map their AI risk programs to AI RMF practices explicitly. The senior SecEng vocabulary includes
this surface is at AI RMF Manage maturity but Map coverage is incomplete
, the same way AppSec vocabulary includesthis program is at SAMM Level 2 in Verification but Level 1 in Operations
. - Anthropic Responsible Scaling Policy. Anthropic's RSP is the canonical public commitment by a frontier-AI lab to evaluate-before-deploy at increasing capability thresholds; the AI Safety Levels (ASL) framework with explicit red-line evaluations for catastrophic-misuse and autonomy capabilities. RSP is the public-policy artifact senior SecEng cite when the org is making frontier-AI deployment decisions or vendor-evaluating frontier-AI suppliers. Constitutional AI is the related Anthropic research line on alignment and harmlessness training.
- The senior SecEng AI threat-model. A 2026 threat-model for an LLM-integrated application maps every system component to OWASP LLM Top 10 categories, every adversary technique to ATLAS technique IDs, every governance commitment to AI RMF functions, and every frontier-model dependency to the supplier's public RSP-equivalent. Without all four references the threat model is missing senior-bar vocabulary.
Career economics: frontier-AI-lab security roles, Microsoft Security Copilot team, and the SecEng+AI specialization
AI Tools in the Security Workflow as a SecEng specialization sits at one of the highest-apply career intersections in 2026. Three career economics anchor the path:
- Frontier-AI-lab security and AI-safety roles. Anthropic, OpenAI, Google DeepMind safety, the Microsoft Security Copilot team, and the broader frontier-AI-lab security organizations. These roles combine traditional Security Engineering (production security, threat modeling, incident response) with AI-specific surfaces (model-weight protection, prompt-injection-resistant agent design, evaluations of frontier-model misuse, RSP-level capability evaluations). Compensation belongs on levels.fyi/t/security-engineer with the per-company filter applied; frontier-AI-lab compensation is heavily equity-driven, with substantial stock or token-equivalent grants on top of the base salary that the BLS wage statistic does not capture.
- Security-product company AI specialization. The teams shipping the AI-augmented security tooling; Microsoft Security Copilot, CrowdStrike Charlotte AI, SentinelOne Purple AI, Datadog Bits AI security ops, Snyk AI security review. The role is a hybrid of SecEng craft and product engineering: shipping the AI-augmented detection pipeline, the LLM-powered triage workflow, the prompt-injection-resistant agent. Compensation also anchors on levels.fyi per-company filters; the public-research and conference-publication expectations are typically higher than at FAANG-tier internal SecEng because the work is product-marketing-adjacent.
- FAANG-tier internal SecEng with AI specialization. The traditional SecEng path with explicit AI-specialization scope; owning the AI-systems-security threat model for the org's ML-integrated products, owning the AI-tool-usage policy for the engineering organization (which models are sanctioned for which data classes, what prompt-injection mitigations are required for tool-calling deployments). Compensation is the standard FAANG-tier SecEng band; the AI specialization is a credible path to staff and principal advancement within the SecEng ladder.
The broader US occupational baseline for Information Security Analysts (the BLS bucket that contains most SecEng work) per the BLS Occupational Outlook Handbook:
- SOC 15-1212 May 2024 median annual wage: $124,910. The BLS median substantially under-counts frontier-AI-lab and FAANG-tier total compensation, which is dominated by equity not captured in wage statistics.
- Employment growth 2024-2034: 29 percent. Much faster than the average for all occupations.
- Annual openings: about 16,000 per year on average across the decade. AI-systems-security and AI-augmented-SecEng tooling are two of the highest-concentration sub-buckets of this growth.
The dominant 2026 anti-pattern in this specialization is AI as vibe-coding security tools: SecEng asking Cursor or Claude Code to write me a security tool
, accepting the output without test coverage or threat-model review, shipping it, and discovering at audit time that the tool has hallucinated APIs, missing input validation, or sensitive-data leakage to unsanctioned models. The anti-pattern is recognizable by three signals: AI-generated security code without test coverage, AI-generated detection rules without false-positive baselining, and AI-tool usage policies that do not enumerate which model providers are sanctioned for which data classes.
The 2026 strong pattern is AI as force-multiplier on human SecEng judgment: AI tools accelerate runbook authoring, detection-rule scaffolding, threat-model drafting, and SIEM querying, but every artifact has a human author of record, a test-coverage commitment, and an explicit threat-model citation. The senior SecEng+AI signal in 2026 calibration cycles is the engineer who shipped the workflow architecture that captures the AI velocity gains without surrendering the judgment work; and who can recite the OWASP LLM Top 10, MITRE ATLAS techniques, NIST AI RMF functions, and Anthropic RSP commitments fluently in the senior interview round.
Frequently asked questions
- What does AI tooling actually change about the Security Engineering workflow in 2026?
- Five surfaces concretely: IDE-integrated AI (Cursor, Claude Code) for runbook authoring, detection-rule scaffolding, threat-model drafting, and postmortem editing; LLM-assisted code review surfacing OWASP Top 10 candidates that humans triage; natural-language SIEM querying through Datadog Bits AI, Honeycomb Query Assistant, and Microsoft Security Copilot; AI-augmented incident response for log analysis and comms drafting; threat modeling AI systems themselves against the OWASP LLM Top 10 and MITRE ATLAS. The framing is force-multiplier; AI accelerates structured-document work and pattern-matching, while humans retain judgment, prioritization, and residual-risk decisions. The senior bar is using AI well, not refusing to use it.
- What is the OWASP LLM Top 10 and how does it relate to the regular OWASP Top 10?
- The OWASP Top 10 for Large Language Model Applications (owasp.org/www-project-top-10-for-large-language-model-applications) is a companion list, not a replacement, for the core OWASP Top 10. It catalogs vulnerability classes specific to LLM-integrated systems: LLM01 Prompt Injection (direct and indirect), LLM02 Insecure Output Handling, LLM03 Training Data Poisoning, LLM06 Sensitive Information Disclosure, LLM07 Insecure Plugin Design, LLM08 Excessive Agency, LLM09 Overreliance, LLM10 Model Theft. Senior SecEng cite LLM categories alongside the core A0x categories; LLM01 Prompt Injection chained to A03 Injection through model-mediated tool calls is a 2026 native vocabulary.
- What is MITRE ATLAS and how is it different from MITRE ATT&CK?
- MITRE ATLAS (atlas.mitre.org) is the Adversarial Threat Space for Artificial-Intelligence Systems; the ATT&CK-style adversary-tradecraft catalog applied to AI and ML systems specifically. ATLAS techniques cover reconnaissance against ML systems, model evasion, model extraction, training-data poisoning, and inference attacks. ATT&CK is the canonical adversary-tradecraft taxonomy for traditional IT systems; ATLAS is the AI-systems extension that fills the vocabulary gap for ML-specific attacks. Senior SecEng working on AI applications cite ATLAS technique IDs in threat models the same way they cite ATT&CK technique IDs for traditional surfaces.
- What is the NIST AI Risk Management Framework?
- The NIST AI Risk Management Framework (AI RMF 1.0, csrc.nist.gov/Projects/ai-risk-management) is the canonical 2026 AI risk taxonomy from NIST. It structures AI risk management around four functions: Govern (organizational risk management practices), Map (context and risk identification), Measure (assessment, metrics, and tracking), Manage (response and prioritization). Federal contractors and increasingly FAANG-tier AI deployments map their AI risk programs to AI RMF practices explicitly. Senior SecEng cite AI RMF maturity per function the same way AppSec engineers cite OWASP SAMM maturity per practice; it is the program-level scoring framework for AI risk.
- What is Anthropic's Responsible Scaling Policy and why does it matter for SecEng?
- Anthropic's Responsible Scaling Policy (anthropic.com/responsible-scaling) is the canonical public commitment by a frontier-AI lab to evaluate-before-deploy at increasing model-capability thresholds. The policy defines AI Safety Levels (ASL) with explicit red-line evaluations for catastrophic-misuse and autonomy capabilities; capability evaluations must pass before a model crosses the threshold to deployment. RSP matters for SecEng because it is the public-policy artifact senior engineers cite when the org is making frontier-AI deployment decisions, vendor-evaluating frontier-AI suppliers, or designing internal AI-tool-usage policies. The Constitutional AI research line is the related Anthropic technical work on alignment and harmlessness training.
- How should I use Cursor and Claude Code for SecEng work without shipping vibe-coded security tools?
- Three discipline rules. First, every AI-generated artifact has a human author of record who has read every line and accepts responsibility for correctness; no rubber-stamping, no auto-merge on AI-generated code in security-sensitive paths. Second, AI-generated security code must have test coverage written by a human, including failure-mode tests that exercise the hallucinated-API and missing-input-validation cases. Third, AI-generated detection rules must be false-positive-baselined against historical data before deployment; the model can produce syntactically-correct rules with operationally-broken thresholds. The framing is AI as a tireless junior whose work the senior reviews; not AI as oracle whose output ships unreviewed.
- Is LLM-assisted code review for security a replacement for human PR review?
- No; it is a force-multiplier on human review. Modern frontier models (Claude, GPT-4, Gemini) can surface candidate OWASP Top 10 findings in PRs that humans then triage, but they miss design-flaw issues (A04 Insecure Design) that require cross-component trust-model reasoning, they miss novel injection primitives, and they miss multi-step authorization-bypass chains where each step looks innocuous in isolation. The 2026 senior pattern is reviewing the AI's review; confirming true positives, dismissing hallucinated APIs, validating that flagged authorization gaps are real. AI security review supplements human review; it does not replace it. GitHub Copilot security review and CodeQL-Copilot integrations are the canonical IDE references.
- What is the failure mode for natural-language SIEM querying through tools like Datadog Bits AI or Microsoft Security Copilot?
- The model can produce a syntactically-correct query that does not answer the actual hypothesis. Common failure modes: hallucinated field names that do not exist in the org's log schema, wrong time-window selection, insufficiently-restrictive WHERE clauses that drown the analyst in noise, missing JOINs across log sources that hide the correlation the hunter actually wanted, and interpretive summaries that overstate the certainty of the findings. The 2026 senior threat-hunter workflow is read the generated query critically before running it, run it and read the raw rows directly rather than trusting the model's summary, iterate in plain English to refine, and capture the final query plus result in the threat-hunt journal as the durable artifact.
- What career path makes sense for a SecEng who wants to specialize in AI tools and AI safety?
- Three credible 2026 paths. First, frontier-AI-lab security or AI-safety roles at Anthropic, OpenAI, Google DeepMind, or the Microsoft Security Copilot team; combining traditional SecEng craft with AI-specific surfaces (model-weight protection, prompt-injection-resistant agent design, RSP-level capability evaluations). Compensation on levels.fyi per-company, equity-heavy. Second, security-product company AI roles at Microsoft Security Copilot, CrowdStrike Charlotte AI, SentinelOne Purple AI, Datadog Bits AI security ops, Snyk AI; shipping AI-augmented security tooling. Third, FAANG-tier internal SecEng with explicit AI-specialization scope owning AI-systems-security threat models. The shared preparation: fluent OWASP LLM Top 10 vocabulary, MITRE ATLAS technique IDs, NIST AI RMF functions, and Anthropic RSP commitments; plus a public artifact in the AI-security space.
- What is the senior interview signal for AI Tools in Security Workflow specialization?
- Three load-bearing signals. First, fluent vocabulary; recite the OWASP LLM Top 10 categories with one example each, name MITRE ATLAS techniques relevant to the role's products, cite NIST AI RMF functions, and discuss Anthropic RSP commitments without notes. Second, workflow architecture; describe an explicit AI-tool-usage policy (which models for which data classes), the audit step for AI-generated security code, the false-positive baselining for AI-scaffolded detection rules. Third, threat-modeling depth on a real LLM-integrated system; walk through prompt injection chains into tool-calling primitives, plugin-design authorization boundaries, and sensitive-information disclosure mitigations on a 60-90 minute whiteboard session. Public artifacts (a published threat model for an LLM application, a CodeQL or Semgrep rule for an LLM-specific bug class, a conference talk) outweigh certifications.
Sources
- NIST AI Risk Management Framework (AI RMF 1.0); Govern, Map, Measure, Manage
- OWASP Top 10 for Large Language Model Applications
- MITRE ATLAS; Adversarial Threat Space for AI Systems
- Anthropic Responsible Scaling Policy; frontier-AI evaluate-before-deploy commitment
- Anthropic Research; Constitutional AI and alignment publications
- Claude Code documentation; Anthropic's agentic coding interface
- Cursor; AI-native code editor
- Datadog Bits AI for security operations; natural-language SIEM querying
- Honeycomb Query Assistant; natural-language observability querying
- GitHub Copilot; AI code review and security review
- MITRE ATT&CK; adversary tactics and techniques (parent taxonomy to ATLAS)
- levels.fyi; Security Engineer compensation track (per-company filters)
- BLS Occupational Outlook Handbook; Information Security Analysts (SOC 15-1212)
About the author. Blake Crosley founded ResumeGeni and writes about security engineering, hiring technology, and ATS optimization. More writing at blakecrosley.com.