Close Menu
    Facebook X (Twitter) Instagram
    Wifi PortalWifi Portal
    • Blogging
    • SEO & Digital Marketing
    • WiFi / Internet & Networking
    • Cybersecurity
    • Tech Tools & Mobile / Apps
    • Privacy & Online Earning
    Facebook X (Twitter) Instagram
    Wifi PortalWifi Portal
    Home»Cybersecurity»Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models
    Cybersecurity

    Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models

    adminBy adminFebruary 4, 2026No Comments4 Mins Read
    Facebook Twitter LinkedIn Telegram Pinterest Tumblr Reddit WhatsApp Email
    Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Ravie LakshmananFeb 04, 2026Artificial Intelligence / Software Security

    Microsoft on Wednesday said it built a lightweight scanner that it said can detect backdoors in open-weight large language models (LLMs) and improve the overall trust in artificial intelligence (AI) systems.

    The tech giant’s AI Security team said the scanner leverages three observable signals that can be used to reliably flag the presence of backdoors while maintaining a low false positive rate.

    “These signatures are grounded in how trigger inputs measurably affect a model’s internal behavior, providing a technically robust and operationally meaningful basis for detection,” Blake Bullwinkel and Giorgio Severi said in a report shared with The Hacker News.

    LLMs can be susceptible to two types of tampering: model weights, which refer to learnable parameters within a machine learning model that undergird the decision-making logic and transform input data into predicted outputs, and the code itself.

    Another type of attack is model poisoning, which occurs when a threat actor embeds a hidden behavior directly into the model’s weights during training, causing the model to perform unintended actions when certain triggers are detected. Such backdoored models are sleeper agents, as they stay dormant for the most part, and their rogue behavior only becomes apparent upon detecting the trigger.

    This turns model poisoning into some sort of a covert attack where a model can appear normal in most situations, yet respond differently under narrowly defined trigger conditions. Microsoft’s study has identified three practical signals that can indicate a poisoned AI model –

    • Given a prompt containing a trigger phrase, poisoned models exhibit a distinctive “double triangle” attention pattern that causes the model to focus on the trigger in isolation, as well as dramatically collapse the “randomness” of model’s output
    • Backdoored models tend to leak their own poisoning data, including triggers, via memorization rather than training data
    • A backdoor inserted into a model can still be activated by multiple “fuzzy” triggers, which are partial or approximate variations

    “Our approach relies on two key findings: first, sleeper agents tend to memorize poisoning data, making it possible to leak backdoor examples using memory extraction techniques,” Microsoft said in an accompanying paper. “Second, poisoned LLMs exhibit distinctive patterns in their output distributions and attention heads when backdoor triggers are present in the input.”

    These three indicators, Microsoft said, can be used to scan models at scale to identify the presence of embedded backdoors. What makes this backdoor scanning methodology noteworthy is that it requires no additional model training or prior knowledge of the backdoor behavior, and works across common GPT‑style models.

    “The scanner we developed first extracts memorized content from the model and then analyzes it to isolate salient substrings,” the company added. “Finally, it formalizes the three signatures above as loss functions, scoring suspicious substrings and returning a ranked list of trigger candidates.”

    The scanner is not without its limitations. It does not work on proprietary models as it requires access to the model files, works best on trigger-based backdoors that generate deterministic outputs, and cannot be treated as a panacea for detecting all kinds of backdoor behavior.

    “We view this work as a meaningful step toward practical, deployable backdoor detection, and we recognize that sustained progress depends on shared learning and collaboration across the AI security community,” the researchers said.

    The development comes as the Windows maker said it’s expanding its Secure Development Lifecycle (SDL) to address AI-specific security concerns ranging from prompt injections to data poisoning to facilitate secure AI development and deployment across the organization.

    “Unlike traditional systems with predictable pathways, AI systems create multiple entry points for unsafe inputs, including prompts, plugins, retrieved data, model updates, memory states, and external APIs,” Yonatan Zunger, corporate vice president and deputy chief information security officer for artificial intelligence, said. “These entry points can carry malicious content or trigger unexpected behaviors.”

    “AI dissolves the discrete trust zones assumed by traditional SDL. Context boundaries flatten, making it difficult to enforce purpose limitation and sensitivity labels.”

    Backdoors Detect Develops Language large Microsoft Models OpenWeight Scanner
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email
    Previous ArticleSmart AI Policy Means Examing Its Real Harms and Benefits
    Next Article Qualcomm earnings impress with record revenues, but the ongoing memory shortage soils its outlook as the entire industry braces for a tough year… or two
    admin
    • Website

    Related Posts

    TP-Link routers face exploitation attempt linked to high-severity flaw

    April 17, 2026

    Grinex exchange blames “Western intelligence” for $13.7M crypto hack

    April 17, 2026

    CoChat Launches AI Collaboration Platform to Combat Shadow AI

    April 17, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Search Blog
    About
    About

    At WifiPortal.tech, we share simple, easy-to-follow guides on cybersecurity, online privacy, and digital opportunities. Our goal is to help everyday users browse safely, protect personal data, and explore smart ways to earn online. Whether you’re new to the digital world or looking to strengthen your online knowledge, our content is here to keep you informed and secure.

    Trending Blogs

    TP-Link routers face exploitation attempt linked to high-severity flaw

    April 17, 2026

    Sky Map 1.13.3:Mars by Sky Map Devs

    April 17, 2026

    IPv6 may briefly have accounted for more than half of internet traffic

    April 17, 2026

    Grinex exchange blames “Western intelligence” for $13.7M crypto hack

    April 17, 2026
    Categories
    • Blogging (64)
    • Cybersecurity (1,369)
    • Privacy & Online Earning (170)
    • SEO & Digital Marketing (841)
    • Tech Tools & Mobile / Apps (1,636)
    • WiFi / Internet & Networking (228)

    Subscribe to Updates

    Stay updated with the latest tips on cybersecurity, online privacy, and digital opportunities straight to your inbox.

    WifiPortal.tech is a blogging platform focused on cybersecurity, online privacy, and digital opportunities. We share easy-to-follow guides, tips, and resources to help you stay safe online and explore new ways of working in the digital world.

    Our Picks

    TP-Link routers face exploitation attempt linked to high-severity flaw

    April 17, 2026

    Sky Map 1.13.3:Mars by Sky Map Devs

    April 17, 2026

    IPv6 may briefly have accounted for more than half of internet traffic

    April 17, 2026
    Most Popular
    • TP-Link routers face exploitation attempt linked to high-severity flaw
    • Sky Map 1.13.3:Mars by Sky Map Devs
    • IPv6 may briefly have accounted for more than half of internet traffic
    • Grinex exchange blames “Western intelligence” for $13.7M crypto hack
    • Google’s Product Feed Strategy Points To The Future Of Retail Discovery
    • Solid-state battery tech is finally mainstream, starting with BMX SolidSafe power banks
    • CoChat Launches AI Collaboration Platform to Combat Shadow AI
    • 8 Ways to Elevate Your Brand as a Creator or Entrepreneur (& Close the Pay Gap)
    © 2026 WifiPortal.tech. Designed by WifiPortal.tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms and Conditions
    • Disclaimer

    Type above and press Enter to search. Press Esc to cancel.