‘TrustFall’ Exposes Claude Code Execution Risk

Developers using the latest versions of AI coding tools like Claude Code, Cursor CLI, Gemini CLI, and CoPilot CLI could inadvertently execute malicious code on their systems with a single keypress, or no keypress at all in continuous integration environments.
That, according to researchers at Adversa AI, is because none adequately warn users of how a malicious repo can auto-approve and spawn a Model Context Protocol (MCP) server without their explicit approval or knowledge. All four coding tools show some form of a trust dialog prompting the user to indicate whether they trust a particular repo, but they do not offer full details on what that consent might actually entail.
Adversa AI identified Claude Code as offering the least information in its trust dialog, and Gemini AI as offering the most, along with a choice in terms of allowing or disallowing an MCP server to execute on the developer’s system. But the exposure is the same in all four, according to Adversa’s lead researcher, Rony Utevsky.

Related:Reverse Engineering With AI Unearths High-Severity GitHub Bug

“A repository can ship a configuration that auto-approves and immediately launches an MCP server, no tool call from the agent is required,” he tells Dark Reading. “The variation is purely in how clearly the dialog tells the user what they are consenting to.”
Anthropic itself however has described the issue that Adversa AI identified as existing outside its threat model, and it told Adversa AI that it believes its trust dialog offers sufficient warning to users. Anthropic pointed to how any malicious activity happens only after the user has allowed a repo/folder to be trusted or safe, Utevsky says, adding that Adversa AI has not raised the issue with the other AI coding toolmakers because Anthropic’s approach appears to be the general convention.

“Once we identified the issue as a class-level convention rather than a vendor bug, vendor-specific disclosure stopped being the right shape of response: you can responsibly disclose a vulnerability to a vendor, but not a convention,” he explains.

A Straightforward Path?

According to Adversa AI, all a threat actor would need to do to pull off an attack is create a repository that includes a malicious MCP server and configuration settings that auto-approve it to run. When a developer clones or opens the repo in the AI coding tool and presses “enter” on what appears to be a routine security check, the AI coding tool unwittingly launches the attacker-controlled code with the developer’s full system privileges and no further prompting.

Related:Fresh Wave of GlassWorm VS Code Extensions Slices Through Supply Chain

The payload can vary, and can allow attackers to read local files, including secrets, SSH keys, and tokens; access other projects; install backdoors; and establish a command-and-control connection. In a CI/CD environment, the same attack would unfold with no human interaction at all.

“The impact is full-machine compromise, not just project access,” researchers at Adversa AI said in a report this week that focused on attacks using Claude Code. “MCP servers execute as native OS processes with the full privileges of the user running Claude Code.” That means they aren’t sandboxed or confined in any way. “The payload runs the moment the MCP server process starts,” they added.

A Risky Change to the Trust Dialog in Claude Code

The report points to a trust dialog change that Anthropic introduced in Claude Code version 2.1, which removed warning language that previously made the risk more visible to users. That change has turned a routine developer action of cloning or reviewing a repo into a high-risk action, Utevsky says.

“The dialog users see is a simple ‘Yes, I trust this folder,'” he explains. “Most developers don’t realize ‘trusting’ hands over that much power.” In contrast, earlier versions of Claude Code prior to 2.1 warned about MCP execution explicitly, and offered an option to proceed with MCP servers disabled. Both are no longer present, Utevsky says.

Related:Vercel Employee’s AI Tool Access Led to Data Breach

The security researcher says the TrustFall issue joins three exploitable vulnerabilities in Claude Code that could allow a malicious repository to abuse project-scoped settings to silently change how the tool behaves on a developer’s machine. The other three vulnerabilities include CVE-2025-59536, CVE-2026-21852, and CVE-2026-33068, all of which Anthropic has patched.

Adversa AI also identified three configuration settings that an attacker could use in their malicious repos to trigger arbitrary code execution on a developer’s system, without an explicit prior warning from Claude Code. One of them uses a setting that would automatically approve a malicious MCP server to run the moment the user accepts Claude Code’s broad folder trust prompt. The second involves planting the payload directly in the configuration file making it harder for security scanners to flag, and the third pre-authorizes specific tool calls through project settings, enabling code execution without further user interaction.

“In our opinion, the language of the new warning dialog downplays the decision’s importance and the severity of the consequences, while providing no information about the project contents,” Utevsky says. “It also defaults to ‘trust,’ so a reflexive press of ‘enter’ leads to unsafe behavior.”

Claude Code’s handling of dangerous settings is also internally inconsistent, he believes. Other configuration settings, such as bypassPermissions, invoke a much more alarming dialog with stronger language, and it defaults to “No, exit.” “The same product treats less dangerous settings more carefully than this one,” Utevsky says.

Not a Vulnerability, But Developers Still Need Defenses

Anthropic’s position is that unlike previous vulnerabilities that allowed malicious code execution before a trust dialog even appeared, the issue that Adversa AI has identified involves code execution that happens only after the user has consented to the project. “Whether this meets Anthropic’s threshold for a vulnerability is their call,” the security vendor noted in its report. “Whether users are making an informed trust decision under the v2.1+ dialog, in our view, is not a close question. They are not.”

Reducing exposure to the AI agent threats like these, according to Adversa AI, boils down to tightening controls across developer endpoints and CI/CD pipelines, and bolstering overall visibility into how tools like Claude Code are used.

On developer systems, organizations should focus on inspecting project configurations and monitoring for unexpected behavior when new repositories are opened. Organizations should make sure they validate projects and use behavioral monitoring to detect unusual processes or activity initiated by development tools In CI environments, the most effective safeguard is to avoid running the tool automatically on untrusted code, Adversa said. “Inspecting repo settings, automation actions, and project scaffolding isn’t technically complex, but it takes time and discipline,” Utevsky says. “It’s also unavoidable now, given how common supply chain attacks and intentionally malicious open source packages have become.”

‘TrustFall’ Exposes Claude Code Execution Risk

Encryption Consulting launches CertSecure Manager v3.3 with zero-touch certificate renewals

GitHub confirms breach of 3,800 repos via malicious VSCode extension

Grafana GitHub Breach Exposes Source Code via TanStack npm Attack

WordPress 7.0 Launches With Native AI Integration

Best AI search analytics tools for marketing teams

Google tests new conversational ad formats in AI Mode and Search

How to measure AI search visibility: KPIs & reporting

Our Picks

WordPress 7.0 Launches With Native AI Integration

Best AI search analytics tools for marketing teams

Google tests new conversational ad formats in AI Mode and Search

‘TrustFall’ Exposes Claude Code Execution Risk

A Straightforward Path?

A Risky Change to the Trust Dialog in Claude Code

Not a Vulnerability, But Developers Still Need Defenses

Related Posts