
On February 23, Summer Yue, Director of AI Alignment at Meta, shared a thread on X that quickly went viral, drawing nearly 10 million views. She had been testing an AI agent called OpenClaw on a separate toy inbox for weeks and it handled every scenario as expected.
Confident in its performance, she connected it to her primary inbox with a simple brief: review the inbox, suggest what to archive or delete, and do nothing until she approves. Instead, the agent went on a rampage, deleting and archiving over 200 emails while she desperately typed stop commands from her phone.
Article continues below
Chief Evangelist at Kore.ai
The natural assumption is that the agent went rogue. It had not. It had simply forgotten the instruction. Her real inbox was significantly larger than the toy account, and that triggered context window compaction, where older context is compressed to make room for new information. Her safety instruction was in that older context.
Once it was gone, the agent did exactly what it thought it was supposed to do: clean the inbox. And that is the uncomfortable truth for every enterprise deploying AI agents today. We type a prompt and assume it holds. But a prompt is not governance. It never was.
A prompt in a chat window is not governance
What Yue ran into is not an isolated edge case. It is the natural consequence of using a tool that was never designed to carry the weight of governance.
At that scale, governance cannot depend on what someone remembers to type.
Agents also optimize toward objectives, not human judgment. Suggesting what to delete and actually deleting it look exactly the same to an agent trying to complete a task. Without something in the architecture that forces a pause before an irreversible action, it simply will not pause. Prompts are instructions. They are not infrastructure.
What ungoverned agents actually do
Yue’s situation was relatively contained. One person, one inbox, partially recoverable. But there was no audit trail, and that was just one agent on one quiet afternoon. Now consider that same absence of governance across an entire enterprise, touching customer data, financial records, and internal communications at scale.
AI researcher Simon Willison coined the term lethal trifecta to describe what makes this dangerous. When an agent has access to private data, processes content from untrusted sources, and can communicate externally, a malicious instruction hidden inside a document can redirect everything it does next.
The agent cannot tell the difference. It follows both. And because agents run continuously, the damage does not have to happen right away.
This is not a distant theoretical risk. It is what happens when you give an agent broad access and assume a prompt will keep it honest. The agent is only as safe as the platform it runs on.
Every organization has rules about who can see what. Those rules do not stop being relevant just because the work is now being done by an agent. If the platform does not enforce them, the agent will operate as though they do not exist.
The same applies to actions. Every time an agent updates a record, sends a communication, or modifies data, someone needs to authorize that, and a prompt cannot do it.
Governance by design means hard constraints at the system level, access scoped to what each person needs, confirmation before anything irreversible, and recoverability built in for when things go wrong. That is a platform decision, made before the agent ever acts, not a prompt typed in hope.
What governance by design looks like in practice
When we designed our platform, incidents like this were not hypotheticals. They were design requirements. Every failure mode, every boundary violation, every action taken without a human in the loop, each one became a question we had to answer in the architecture before we wrote a single line of product code.
Here is what that looks like in practice.
User management: Not everyone in an organization should have access to everything, and neither should their agents. Role-based controls ensure access boundaries hold as teams and deployments scale.
Security and compliance: Sensitive data needs to be protected before an agent touches it, not after. PII masking, SSO, IP restrictions, and content filters enforced at the platform level are the difference between controlled access and exposure.
Data retention: The organization should decide what gets stored, for how long, and at what level of detail. That decision should never be left to default.
Orchestration: An agent should follow what the organization decided, not what it infers in the moment. Guardrails, routing logic, and fallback behavior configured by an administrator, not typed into a chat window.
Governance, monitoring and audit: Compliance that is only reviewed after an incident is not compliance. Every action, every agent, tracked continuously, with a trail that already exists when something goes wrong.
Workspace controls: Access is never assumed. Permissions, publishing rules, and agent types are all administrator-controlled from the start.
Building responsible AI
Responsible AI does not happen by accident. It is built deliberately, through every decision made before an agent ever touches live data. In an enterprise context, the stakes of getting it wrong are not just technical. They are reputational, regulatory, and deeply human.
Having worked with enterprises across regulated industries, we have found that the hardest part is never the technology. It is the commitment to asking the uncomfortable questions early: what can this agent access, what can it do unsupervised, and who is accountable when it gets it wrong.
The enterprises that ask those questions first are the ones that deploy AI tools with confidence. And that is the standard the industry needs to move toward.
We’ve featured the best AI chatbot for business.
This article was produced as part of TechRadarPro’s Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

