The glut of AI-generated content could introduce risks to large language models (LLMs) as AI tools begin to train on themselves.
Gartner on Jan. 21 predicted that, by 2028, 50% of organizations will implement a zero-trust data governance posture due to an increase in what the analyst firm calls “unverified AI-generated data.” Gartner dubbed the idea “model collapse,” where machine-learning models could degrade based on errors introduced when they train on AI-generated content. That, in turn, could prompt a new security practice area related to zero-trust: continuous model behavior evaluation.
As the firm explained in a news release, LLMs are trained on data scraped both from the Internet and other content like books and code repositories. As some of these sources include (and will likely increasingly include) AI-generated content, that content will inform its future outputs, even if it’s a hallucination or has other issues, which could well degrade the quality of the models.
The basic idea is that as the signal quality degrades over time through junk training data, models can remain fluent and fully interact with the user while becoming less reliable. From a security standpoint, this can be dangerous, as AI models are positioned to generate confident-yet-plausible errors when it comes to code reviews, patch recommendations, app coding, security triaging, and other tasks. More critically, model degradation can erode and misalign system guardrails, giving attackers the opportunity exploit the opening through things like prompt injection.
Gartner said 84% of respondents in its 2026 CIO and Technology Executive Survey expect their enterprises to increase generative AI (GenAI) funding for 2026. Model degradation, though a theoretical issue today, could become quickly relevant in a world where organizations hastily and aggressively apply LLM-powered products.
“As AI-generated content becomes more prevalent, regulatory requirements for verifying ‘AI-free’ data are expected to intensify in certain regions,” Wan Fui Chan, managing vice president at Gartner, said. “However, these requirements may differ significantly across geographies, with some jurisdictions seeking to enforce stricter controls on AI-generated content, while others may adopt a more flexible approach.”
Model Collapse: How Real of an Issue Is It?
Melissa Ruzzi, director of AI at security vendor AppOmni, tells Dark Reading that due to incorrect human-generated data, human bias, and other factors, “the notion of having pure, clean and perfectly correct data to train AI is not valid, regardless of whether the data was created by other AIs or by humans.”
Which is not to say there are no issues surrounding potential model degradation through faulty training data; rather, Ruzzi argues both faulty human and AI training data can negatively affect outputs, and this broader problem should be considered and taken seriously.
Diana Kelley, chief information security officer (CISO) at AI security and governance firm Noma Security, meanwhile says that model collapse is a real, observed failure mode in controlled research, albeit the practical risk to most enterprises is “uneven” today.
“Most enterprises are not training frontier LLMs from scratch, but they are increasingly building workflows that can create self-reinforcing data stores, like internal knowledge bases, that accumulate AI-generated text, summaries, and tickets over time,” she tells Dark Reading. “That is where the future risk accelerates: more synthetic content in the world and more synthetic content inside organizations means the ratio of high-quality, human-generated signal steadily declines.”
Considerations for LLM Users to Make for the Future
Gartner said that to combat the potential impending issue of model degradation, organizations will need a way to identify and tag AI-generated data. This could be addressed through active metadata practices (such as establishing real-time alerts for when data may require recertification) and potentially appointing a governance leader that knows how to responsibly work with AI-generated content.
AppOmni’s Ruzzi says organizations should conduct security reviews and establish guidelines for AI usage, including model choices. Meanwhile Ram Varadarajan, CEO at AI-powered security vendor Acalvio, says lowering risk of model collapse comes as a direct product of a disciplined data pipeline. This means knowing where your data comes from and filtering out the synthetic, toxic, and personally identifiable data from training inputs.
Kelley argues that there are pragmatic ways to “save the signal,” namely through prioritizing continuous model behavior evaluation and governing training data.
“Most importantly, don’t lose sight of the fact that the anchor is real human-generated data. That’s the gold standard for quality data. Treat training and retrieval data as a governed asset, not an exhaust stream,” she says. “That aligns closely with Gartner’s point that organizations cannot implicitly trust data provenance and need verification measures, essentially a zero-trust posture for data governance.”

