How many controls should an AI agent security review cover?

Forty-seven, spread across ten categories: identity, secrets, tool scoping, prompt injection, data egress, audit, human-in-the-loop, rollback, output validation, third-party trust. Each control has an action, a verification step, and a default failure mode.

What is the #1 risk in the OWASP LLM Top 10 for 2025?

Prompt injection (LLM01). Untrusted input that the model treats as instruction. The defense is layered: input filtering, output validation, tool scoping, and human gates on high-risk actions. No single layer is sufficient.

What is excessive agency?

OWASP LLM07. An agent permitted to do more than it needs to. The defense is least-privilege capability scoping at the platform layer, never in the prompt. See the blast-radius post for the bounding levers.

How often should I rotate agent API keys?

Every 90 days minimum for production credentials, every 30 days for high-privilege keys. Use short-lived tokens minted per agent run where the provider supports it. Long-lived shared keys are the worst pattern.

How do I make audit logs tamper-evident?

Hash-chain every entry: each entry includes the hash of the previous entry plus its own content. An attacker editing a past entry must rewrite the chain forward, which is detectable if the tip hash is stored elsewhere (a third-party witness or a write-once store).

The AI Agent Security Checklist for 2026: 47 Controls Every Team Should Verify

Prompt injection is the #1 risk on the OWASP LLM Top 10 (OWASP, 2025). Agents amplify every LLM risk by adding tools, persistence, and autonomy. This checklist gives you 47 controls across 10 categories. Each control has an action, a verification step, and a default failure mode. Fork it into your wiki. Pair it with the blast radius post for the deep dive on bounding levers, and the monitoring playbook for how to see what the agent did.

Identity (5 controls)

Per-agent service identity. Verify: every agent run has a unique principal in the IAM system, not a shared user. Default failure: shared service account.
Short-lived tokens, not long-lived keys. Verify: tokens minted per run, TTL ≤ run duration plus 5 minutes. Default failure: an API key in an env var that never rotates.
Per-run audit attribution. Verify: trace logs show the agent identity that performed each action. Default failure: shared identity, no attribution.
End-user identity propagation. Verify: downstream services see the original user identity, not the agent's. Default failure: agent acts as god-user.
Federation with the existing IdP. Verify: agents authenticate through the company's SSO, not a parallel identity store. Default failure: a separate "AI users" directory.

Secrets and credentials (5 controls)

Secrets in a vault, not in env vars. Verify: code search finds no secrets in source. Default failure: GitHub Actions secret leaking via echo.
Rotate keys on a schedule. 90 days for production, 30 for high-privilege. Verify: rotation log shows last rotation date per key. Default failure: keys older than the engineer who set them.
Per-tenant secret scopes. Verify: tenant A's vault path differs from tenant B's. Default failure: shared umbrella key.
Just-in-time secret access. Verify: agent fetches secret at run start, not at boot. Default failure: secret in memory for the life of the process.
Secret scanner in CI. Verify: CI fails on detected secrets. Default failure: detect-secrets disabled "for now".

Tool scoping and blast radius (6 controls)

Per-tool capability scope. Verify: tool registry shows explicit scope per tool. Default failure: tool accepts arbitrary SQL.
Default read-only. Verify: a fresh agent has no write capability. Default failure: admin defaults applied at provisioning.
Capability TTL. Verify: grants expire. Default failure: forever grants.
Per-tool rate limits. Verify: rate limit fires platform-side. Default failure: limit in the prompt only.
Spend cap per tool. Verify: a transaction above cap is rejected before tool fires. Default failure: cap tracked, not enforced.
Reversibility tier per tool. Verify: registry export shows tier. See the blast radius post for the tier definitions. Default failure: untagged tools.

Prompt injection defense (6 controls)

Treat all retrieved content as untrusted. Verify: retrieved chunks are clearly delimited and labeled in the prompt. Default failure: retrieved content concatenated as if it were instruction.
Input filtering for known injection patterns. Verify: known-bad patterns blocked. Default failure: no filter, no detection.
Output validation against schema. Verify: outputs that fail schema are rejected, not passed downstream. Default failure: free-text outputs trusted.
Tool-call validation. Verify: tool args validated against schema before tool fires. Default failure: tool fires on any args.
Out-of-band confirmation for high-risk actions. Verify: actions above a risk threshold require external confirmation (SMS, email). Default failure: agent confirms itself.
Red-team test cases in CI. Verify: a labeled prompt-injection test set runs on every model or prompt change. Default failure: red-team done once, never re-run.

Data access and egress (5 controls)

Per-agent data scope. Verify: agent can read only its scope. Default failure: full DB access.
Outbound destinations allow-listed. Verify: random domain blocked. Default failure: any URL allowed.
Outbound payload size capped. Verify: 100MB POST rejected. Default failure: unbounded payload.
Outbound DLP scan. Verify: fake credit card blocked. Default failure: no outbound DLP.
Per-call destination logged. Verify: replay shows every URL hit. Default failure: error logs only.

Audit logging (5 controls)

One trace per run. Verify: trace ID resolves to a full reasoning path.
Tamper-evident audit chain. Hash-chained entries. Tip hash stored externally. Default failure: writable audit log.
Retention policy enforced. Verify: old traces aged out per policy. Default failure: traces retained forever (cost) or deleted nightly (compliance).
PII redacted before storage. Verify: a sample trace shows no PII. Default failure: full PII in cold storage.
External witness for tip hash. Verify: tip hash anchored to a write-once store or a third party. Default failure: same admin can edit log and tip.

Human-in-the-loop (4 controls)

HIL gate on Tier 3+ actions. Verify: a Tier 4 call requires a human approve. Default failure: silent execution.
Skimmable approval payload. Reasoning trace, proposed action, single clear question. Default failure: a wall of text, no human reads it.
Default behavior on timeout. Verify: timeout fires the safe default, not the proposed action. Default failure: action fires on timeout.
Approval log. Verify: who approved what, when. Default failure: anonymous approvals.

Rollback and kill switch (4 controls)

Kill switch latency. Verify: drill shows kill-to-halt under 30 seconds.
Per-tenant kill. Verify: stopping tenant A's agents does not stop tenant B's.
Compensating actions documented. Verify: every Tier 2+ tool has a documented reversal.
Rollback drill quarterly. Verify: dated drill report. Default failure: kill switch built, never tested.

Model output validation (4 controls)

Schema-typed outputs. Verify: outputs conform to JSON schema. Default failure: free-text downstream.
Refusal on confidence below threshold. Verify: low-confidence outputs escalate, not act. Default failure: act anyway.
Content policy validation. Verify: outputs scanned for policy violations before send. Default failure: unscanned outbound text.
Hallucination check on factual outputs. Verify: cited claims resolved against the source. Default failure: model hallucinates, user trusts.

How to use this checklist

Two passes. First pass: walk every control with the team that owns the agent. For each one, record state as one of pass, fail, or N/A. Time-box to 90 minutes; if a control needs research, mark it as "unverified" and move on. Second pass: triage the fails and unverifieds by blast radius (use the blast radius worksheet) and remediate in that order. Identity, secrets, and tool scoping usually top the priority list; the others compound after those are solid.

Quarterly re-review. Models update, dependencies update, the team rotates. A checklist passed in March can fail in June without any code change because a library shipped a new behavior or a vendor changed an API default. Set a calendar reminder. Pass results into the SOC 2 evidence pack or the equivalent your auditor accepts.

Mapping controls to real incidents

The controls earn their keep against named failure modes. A 2024 advisory on plugin compromise mapped directly to control #11 (Tool scoping per capability) and #45 (Third-party tools sandboxed). Prompt-injection incidents documented across the industry in 2024 and 2025 map to controls #16-21 (Prompt injection defense). Cross-tenant leak incidents map to #1 (Per-agent service identity) and #3 (Per-tenant secret scopes). The pattern: incidents that look novel almost always trace back to a checklist item that was passed quickly or never verified. The cheapest defense is the next-quarter re-audit that re-tests every "verified" control.

Third-party tool trust (3 controls)

Tool provenance recorded. Verify: every registered tool has author, version, hash. Default failure: anonymous tools.
Third-party tools sandboxed. Verify: they cannot read the agent's secrets. Default failure: shared scope.
SBOM and dependency scan. Verify: third-party tools scanned for known CVEs in CI. Default failure: rolling Russian roulette on transitive deps.

FAQ

How many controls should a review cover?: Forty-seven, across ten categories. Each control has an action, a verification step, and a default failure mode.
What is the #1 OWASP LLM risk?: Prompt injection (LLM01). Defense is layered: input filter, output validation, tool scoping, human gates.
What is excessive agency?: OWASP LLM07. Agent permitted more than needed. Defense is least-privilege scoping at the platform layer.
How often should I rotate API keys?: 90 days production, 30 days high-privilege. Use short-lived per-run tokens where supported.
How do I make audit logs tamper-evident?: Hash-chain entries; anchor the tip externally. Editing past entries rewrites the chain forward and is detectable.

Mapping the checklist to common compliance frameworks

The 47 controls map cleanly to most security frameworks teams are already audited against. SOC 2 Common Criteria CC6 (Logical Access) covers identity and secrets. CC7 (System Operations) covers audit logging and incident response. ISO 27001 Annex A.9 covers access control; A.12 covers operations security; A.16 covers incident management. NIST 800-53 Rev 5 mapping is direct: AC-2/3/6 (access control) covers identity and scoping; AU-2 through AU-12 covers audit logging; IR-4 through IR-8 covers incident response and rollback. HIPAA Security Rule §164.312 covers technical safeguards for any health-data agent.

Practical advice: do not invent a new evidence pack. Take the controls from this checklist, tag each one with the SOC 2, ISO 27001, and NIST 800-53 sections it maps to, and hand the same artifact to every auditor. The work is done once; the format flexes per audit.

Closing the loop

Forty-seven controls in ten categories. Audit them once, fix the defaults, then re-audit quarterly. The list is short enough to fit on one wiki page and long enough to catch the bugs that get teams in the press. Related: blast radius control, monitoring playbook, and the broader security playbook.

Sources

OWASP, "OWASP Top 10 for LLM Applications", 2025, genai.owasp.org
NIST, "AI Risk Management Framework", 2023, nist.gov
NIST, "SP 800-53 Rev 5", 2020, csrc.nist.gov
MITRE, "ATLAS", 2025, atlas.mitre.org
CISA, "Secure Our World", 2024, cisa.gov
IBM, "Cost of a Data Breach 2024", ibm.com