Close Menu
    Facebook X (Twitter) Instagram
    Wifi PortalWifi Portal
    • Blogging
    • SEO & Digital Marketing
    • WiFi / Internet & Networking
    • Cybersecurity
    • Tech Tools & Mobile / Apps
    • Privacy & Online Earning
    Facebook X (Twitter) Instagram
    Wifi PortalWifi Portal
    Home»SEO & Digital Marketing»LLMs ‘Would Not Exist’ Without Reddit Data
    SEO & Digital Marketing

    LLMs ‘Would Not Exist’ Without Reddit Data

    adminBy adminMay 25, 2026No Comments6 Mins Read
    Facebook Twitter LinkedIn Telegram Pinterest Tumblr Reddit WhatsApp Email
    Reddit CEO: LLMs ‘Would Not Exist’ Without Reddit Data
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Reddit CEO Steve Huffman said large language models “would not exist as we know them” without Reddit’s content. He called the platform’s user-generated data “modern oil” for AI.

    Huffman made the comments during an interview at Fast Company’s Most Innovative Companies Summit.

    What Huffman Said About Reddit’s Value To AI

    Huffman described the position Reddit’s data holds in the AI ecosystem.

    Huffman said:

    “LLMs would not exist as we know them without Reddit. Reddit is one of the single largest sources of training data for the LLMs and Reddit continues to be one of the primary sources of both training data and we’re also the most cited, the most cited platform across all models.”

    He attributed the citation claim to Profound, a firm that tracks AI citation data.

    Huffman explained why AI companies depend on the content.

    “There’s no artificial intelligence without actual intelligence. At the end of the day, these models are quite simple. They’re regurgitating on an absolutely massive scale what they’ve consumed elsewhere and a large portion of that consumption is actually just the human conversation on Reddit because it’s natural and it covers basically every topic imaginable.”

    Deals For Some, Lawsuits For Others

    Reddit announced data licensing agreements with Google and OpenAI in 2024. Huffman referenced those as Reddit’s original two AI data deals and didn’t announce any additional agreements.

    “Since we did the original two deals with Google and OpenAI, that was over two years ago, so we’ve learned a lot. They’ve learned a lot. The whole world’s learned a lot. Specifically how valuable Reddit’s data is and how useful it is. And so we’re being I think very deliberate and selective there. But yeah, we’re open and open for business.”

    For companies that haven’t agreed to licensing terms, Reddit has taken legal action. The company sued Anthropic in California Superior Court, alleging unauthorized use of Reddit content and violations of Reddit’s terms. Reddit filed a federal lawsuit against Perplexity in the Southern District of New York, along with three data-scraping firms, alleging DMCA anti-circumvention violations and related claims.

    Huffman drew a line between the two groups.

    “Companies like Google and OpenAI where we had good relationships, we can actually do a deal and put some guard rails on use and access to our data on behalf of our users but then collaborate on making products for the next generation of the internet.”

    He added that “not every company is willing to be a collaborative partner and so unfortunately we have to go the other way which is lawsuits.”

    Huffman told the audience Reddit’s position on commercial use is simple. “Commercial use of our data requires commercial terms,” he said. Reddit began charging for commercial API access in 2023, a move that preceded the current licensing deals.

    Huffman said Reddit still provides free data access to researchers and universities and tries to remain flexible for non-commercial use.

    What Changed Reddit’s Openness

    According to Huffman, Reddit’s willingness to share data freely changed when the AI industry moved away from open research. As SEJ previously reported, Reddit limited access for many search engine crawlers while Google remained an exception.

    “Historically, Reddit has been like we’re born of the open internet and Reddit has been open and very permissive for access to its data. And honestly, I think we would be in a different position today if the AI companies were still basically open and open source and doing open research.”

    Huffman said the issue was that Reddit couldn’t longer track how its data was being used. “People are using our data and we don’t know what it was being used for,” he told the audience.

    Beyond commercial terms, Huffman said Reddit wants to prevent its data from being used to identify users, target them with ads, or to replace or disintermediate the platform.

    Reddit’s Own AI Efforts

    Huffman acknowledged what he called a “paradox.” Reddit’s content powers external AI systems, but the company also uses AI across its platform.

    The most visible product is Reddit Answers, an LLM-powered search feature. It reads posts and comments, then organizes them into responses built from verbatim user quotes. Huffman noted it’s designed for questions without definitive answers.

    “What Reddit Answers does is a couple of things that are unique to Reddit. One, it basically only answers in verbatim quotes from actual people. And then the second thing it does is it tries to present multiple perspectives because the whole point if you’re on Reddit, you want the human perspective.”

    Behind the scenes, Reddit uses AI for content moderation and classification. LLMs can evaluate whether a comment crosses into bullying, something Huffman described as previously difficult because of the subjectivity involved.

    Huffman presented AI moderation as a way to reduce exposure to the worst content, not as a replacement for Reddit’s community moderation model.

    “The worst job on the internet used to be looking at the worst content on the internet and deciding whether it could be online or not,” Huffman said. “That job just goes away.”

    The Gray Area Of AI-Written Posts

    Huffman also addressed the challenge of users writing content with AI tools and pasting it into Reddit. That’s different from automated bot activity, he stressed.

    “The most annoying thing that I see not just on Reddit, but all over the internet is somebody who wrote their post or comment with ChatGPT and then pasted it into Reddit. Like, is that a bot? Certainly feels like a bot, but there’s a human behind the idea.”

    Huffman cast the issue as one of intent. “It’s very important to us that there’s a human behind the idea, behind the content, behind the prompt,” Huffman said. But he also noted that “the writing sucks” when users rely on AI to compose their posts.

    Rather than creating a policy to address it, Huffman indicated Reddit will let its community handle the issue. Users are already downvoting AI-written content and calling it out in comments. Huffman said Reddit will “empower the users more and the subreddits more to just reject that sort of content altogether.”

    He compared the broader question to calculators in math class. “Kids these days are just learning how to write with AI. What are we going to do about it?” he said. “We kind of have to learn, I think, along with everybody else.”

    Why This Matters

    Huffman’s comments reinforce Reddit’s pitch that its user discussions are a core input for AI systems.

    The AI-written content problem Huffman described is one SEJ covered as part of a broader YouTube AI slop investigation. Reddit’s decision to let community voting handle AI-generated posts, rather than building detection tools, is a different path than platforms that have deployed automated labeling.

    Looking Ahead

    Huffman told Fast Company that Reddit is “in the market talking to folks all the time” about new data deals, though he didn’t hint at a third agreement.

    Reddit’s lawsuits against Anthropic and Perplexity are both ongoing. The Anthropic case was the subject of a federal court remand hearing in March.

    data exist LLMs Reddit
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email
    Previous ArticleAgent-To-Agent Marketing Was Just Born on Moltbook
    Next Article Why High-Performing Marketers Get Stuck In Execution Mode
    admin
    • Website

    Related Posts

    Google expands Data Manager API with GMP event ingestion

    June 2, 2026

    The 50 Most-Cited Websites in Copilot (June 2026)

    June 2, 2026

    What Google’s New AI Guide Actually Debunks. And What It Doesn’t

    June 2, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Search Blog
    About
    About

    At WifiPortal.tech, we share simple, easy-to-follow guides on cybersecurity, online privacy, and digital opportunities. Our goal is to help everyday users browse safely, protect personal data, and explore smart ways to earn online. Whether you’re new to the digital world or looking to strengthen your online knowledge, our content is here to keep you informed and secure.

    Trending Blogs

    FTC broadens Microsoft probe to cloud, AI, and software bundling

    June 2, 2026

    Google expands Data Manager API with GMP event ingestion

    June 2, 2026

    The 50 Most-Cited Websites in Copilot (June 2026)

    June 2, 2026

    What Google’s New AI Guide Actually Debunks. And What It Doesn’t

    June 2, 2026
    Categories
    • Blogging (88)
    • Cybersecurity (1,955)
    • Privacy & Online Earning (230)
    • SEO & Digital Marketing (1,333)
    • Tech Tools & Mobile / Apps (1,796)
    • WiFi / Internet & Networking (323)

    Subscribe to Updates

    Stay updated with the latest tips on cybersecurity, online privacy, and digital opportunities straight to your inbox.

    WifiPortal.tech is a blogging platform focused on cybersecurity, online privacy, and digital opportunities. We share easy-to-follow guides, tips, and resources to help you stay safe online and explore new ways of working in the digital world.

    Our Picks

    FTC broadens Microsoft probe to cloud, AI, and software bundling

    June 2, 2026

    Google expands Data Manager API with GMP event ingestion

    June 2, 2026

    The 50 Most-Cited Websites in Copilot (June 2026)

    June 2, 2026
    Most Popular
    • FTC broadens Microsoft probe to cloud, AI, and software bundling
    • Google expands Data Manager API with GMP event ingestion
    • The 50 Most-Cited Websites in Copilot (June 2026)
    • What Google’s New AI Guide Actually Debunks. And What It Doesn’t
    • Broadcom, Samsung team for wireless SoC
    • What it means for your marketing strategy in 2026
    • DV360 API Adds Demand Gen Support
    • The 50 Most-Cited Websites in Grok (June 2026)
    © 2026 WifiPortal.tech. Designed by WifiPortal.tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms and Conditions
    • Disclaimer

    Type above and press Enter to search. Press Esc to cancel.