AI went from assistant to autonomous actor and security never caught up

Enterprise AI deployments have shifted from pilot programs to production systems handling customer data, executing business transactions, and integrating with core infrastructure. That has exposed a significant gap between what AI agents can do and what security teams can observe or control.

A briefing published by the AIUC-1 Consortium, developed with input from Stanford’s Trustworthy AI Research Lab and more than 40 security executives, documents the security conditions that emerged in 2025 and projects the risks most likely to affect organizations in 2026. The contributors include CISOs from Confluent, Elastic, UiPath, and Deutsche Börse, along with security researchers and advisors from MIT Sloan, Scale AI, and Databricks.

According to an EY survey cited in the briefing, 64% of companies with annual turnover above $1 billion have lost more than $1 million to AI failures. One in five organizations reported a breach linked to unauthorized AI use, commonly described as shadow AI.

Three security problems are dominating the field

The briefing identifies three categories of risk that security practitioners are dealing with now.

The first is the agent challenge. AI systems have moved past assistants that respond to queries and into autonomous agents that execute multi-step tasks, call external tools, and make decisions without per-action human approval. This creates failure conditions that exist without any external attacker. An agent with overprivileged access and poor containment boundaries can cause damage through ordinary operation. Eighty percent of organizations surveyed reported risky agent behaviors, including unauthorized system access and improper data exposure. Only 21% of executives reported complete visibility into agent permissions, tool usage, or data access patterns.

Omar Khawaja, VP and Field CISO at Databricks, noted that AI components change constantly across the supply chain and that existing security controls assume static assets, creating blind spots when behavior shifts.

The second category is the visibility challenge. Sixty-three percent of employees who used AI tools in 2025 pasted sensitive company data, including source code and customer records, into personal chatbot accounts. The average enterprise has an estimated 1,200 unofficial AI applications in use, with 86% of organizations reporting no visibility into their AI data flows. Shadow AI breaches cost an average of $670,000 more than standard security incidents, driven by delayed detection and difficulty determining the scope of exposure.

The third is the trust challenge. Prompt injection moved from academic research into recurring production incidents in 2025. OWASP’s 2025 LLM Top 10 list ranked prompt injection at the top. The vulnerability exists because LLMs cannot reliably separate instructions from data input. Fifty-three percent of companies now use retrieval-augmented generation or agentic pipelines, each of which introduces new injection surfaces.

Existing frameworks are not sufficient for agent-specific risks

Frameworks such as NIST AI RMF and ISO 42001 provide organizational governance structures, including risk committees and documentation requirements. They do not address the specific technical controls that CISOs need for agentic deployments, such as tool call parameter validation, prompt injection logging, or containment testing for multi-agent systems.

Sanmi Koyejo, who leads Stanford’s Trustworthy AI Research Lab, acknowledged that large-scale longitudinal studies comparing incident rates between organizations using technically specific frameworks and those relying on broader governance do not yet exist. “AIUC-1 is still in its early adoption phase, and the field of AI agent security is too nascent for that kind of controlled comparison,” he told Help Net Security. His lab’s research found that model-level guardrails alone are insufficient: fine-tuning attacks bypassed Claude Haiku in 72% of cases and GPT-4o in 57%. Technically specific controls add input validation, action-level guardrails, and reasoning chain visibility that model-level safety misses. Koyejo drew an analogy to MFA adoption in conventional cybersecurity, noting that specific, auditable technical controls reduced breach risk in ways that high-level policy commitments could not.

Early adopters of technically grounded AI security standards report faster procurement cycles, clearer audit readiness, and reduced friction when deploying agents in regulated environments, according to Koyejo. A case study on applying structured AI security controls at AllianceBernstein, a financial services firm, has been published by Virtue AI, which Koyejo co-founded.

Resourcing continuous adversarial testing

The briefing recommends that organizations integrate continuous red-teaming into agent operations on an ongoing basis. Nancy Wang, CTO of 1Password, said the operating model for enterprises that lack in-house AI security expertise should combine platform defaults, automation, and targeted expertise rather than rely on large specialized teams.

“Baseline guardrails must be built into the platforms themselves,” Wang said. “Sandboxed tool execution, scoped and short-lived credentials, runtime policy enforcement, and comprehensive audit logging should not require custom engineering.” Adversarial testing, she said, should be integrated into CI and release workflows so that model updates, prompt changes, or agent reconfigurations automatically trigger predefined attack suites. Human experts then investigate meaningful deltas rather than rerunning entire playbooks manually.

Wang recommended tiering agents by risk level. Agents with access to sensitive data or production systems warrant continuous adversarial testing and stronger review gates. Lower-risk agents can rely on standardized controls and periodic sampling. “The goal is to make continuous validation part of the engineering lifecycle,” she said.

Koyejo’s lab has pursued the automation question directly. Research on what the lab calls AutoRedTeamer demonstrated that automated attack selection can reduce computational costs by 42 to 58% compared to naive approaches, with broader vulnerability coverage. He recommended that resource-constrained organizations start with automated continuous testing tied to deployment pipelines, implement runtime guardrails before any agent with access to sensitive data or real-world tools goes to production, and use human red-teaming selectively for high-stakes deployments.

In identity and cloud security, Wang noted, the shift from high-level policy statements to enforceable controls such as least privilege, short-lived credentials, and scoped tokens materially reduced lateral movement and constrained impact when incidents occurred. “Agents with tightly scoped capabilities and time-bound credentials simply cannot access what they were never granted,” she said. “That is a concrete and observable difference.”

AI went from assistant to autonomous actor and security never caught up

Home Assistant 2026.3 has arrived: Here’s what’s new

Europol-Led Operation Takes Down Tycoon 2FA Phishing-as-a-Service Linked to 64,000 Attacks

Beazley Exposure Management platform identifies external exposures and prioritizes cyber risk

Google removes accessibility section from JavaScript SEO section

Home Assistant 2026.3 has arrived: Here’s what’s new

Digital sovereignty options for on-prem deployments

Europol-Led Operation Takes Down Tycoon 2FA Phishing-as-a-Service Linked to 64,000 Attacks

Our Picks

Google removes accessibility section from JavaScript SEO section

Home Assistant 2026.3 has arrived: Here’s what’s new

Digital sovereignty options for on-prem deployments

AI went from assistant to autonomous actor and security never caught up

Three security problems are dominating the field

Existing frameworks are not sufficient for agent-specific risks

Resourcing continuous adversarial testing

Related Posts