Cisco research finds standard AI safety benchmarks miss the real threat

The report pairs single-turn and multi-turn adversarial evaluation across 15 closed/proprietary frontier models from OpenAI, Anthropic, Google, Amazon and xAI. Running 30,090 single-turn prompts and 6,986 multi-turn attacks, the team found that the two evaluation regimes produce different model rankings, different failure maps and different risk profiles. Every model tested failed a non-trivial share of multi-turn attacks.

Key findings from the research:

Multi-turn attack success rate (ASR) ranged from 7.89% to 88.30% across all 15 models, against a single-turn range of 2.19% to 64.91%.
Eight of 15 models showed an absolute gap greater than 15 percentage points between the two regimes.
Anthropic’s Claude family, which posted the lowest single-turn ASR in the cohort at 2.19% to 3.64%, still reached 11.16% to 16.20% under iterative attack.
Single-turn failures concentrated in three procedures: Imposter AI at 37.50% weighted ASR, Soft Paraphrase at 29.21% and System Prompts at 27.69%

The findings challenge a common assumption in enterprise AI procurement.

“The surprising thing here is really that a lot of people accept and kind of understand these frontier labs as being state of the art, but they don’t necessarily think through the security and safety implications of that,” Amy Chang, head of AI threat and security research at Cisco, told Network World. “What this research does is kind of showcase that there is still variance across the different models, and how strong they are with the internal guardrails that are built within the model against these types of attacks.”

Cisco research finds standard AI safety benchmarks miss the real threat

Broadcom, Samsung team for wireless SoC

Can Chinese memory maker CXMT help relieve the memory shortage?

The Open Standard That Gives AI Systems A Structured View Of Your Business

The 50 Most-Cited Websites in Copilot (June 2026)

What Google’s New AI Guide Actually Debunks. And What It Doesn’t

Broadcom, Samsung team for wireless SoC

What it means for your marketing strategy in 2026

Our Picks

The 50 Most-Cited Websites in Copilot (June 2026)

What Google’s New AI Guide Actually Debunks. And What It Doesn’t

Broadcom, Samsung team for wireless SoC

Cisco research finds standard AI safety benchmarks miss the real threat

Related Posts