Close Menu
    Facebook X (Twitter) Instagram
    Wifi PortalWifi Portal
    • Blogging
    • SEO & Digital Marketing
    • WiFi / Internet & Networking
    • Cybersecurity
    • Tech Tools & Mobile / Apps
    • Privacy & Online Earning
    Facebook X (Twitter) Instagram
    Wifi PortalWifi Portal
    Home»WiFi / Internet & Networking»Top 11 network outages and application failures of 2025
    WiFi / Internet & Networking

    Top 11 network outages and application failures of 2025

    adminBy adminJanuary 30, 2026No Comments6 Mins Read
    Facebook Twitter LinkedIn Telegram Pinterest Tumblr Reddit WhatsApp Email
    Unhappy Programmer Caught In Maze Of Broken Software And Stress.
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Asana: February 5 & 6

    • Duration: Two consecutive outages, with the second lasting approximately 20 minutes
    • Symptoms: Service unavailability and degraded performance
    • Cause: A configuration change overloaded server logs on February 5, causing servers to restart. A second outage with similar characteristics occurred the following day.
    • Takeaways: “This pair of outages highlights the complexity of modern systems and how it’s difficult to test for every possible interaction scenario,” ThousandEyes reported. Following the incidents, Asana transitioned to staged configuration rollouts.

    Slack: February 26

    • Duration: Nine hours
    • Symptoms: Users could log in and browse channels, but experienced issues sending and receiving messages.
    • Cause: Issues with a maintenance action in their database systems caused an overload of heavy traffic directed at the database.
    • Takeaways: “At first glance, everything looked fine at Slack—network connectivity was good, there were no latency issues, and no packet loss,” according to ThousandEyes. Only by combining multiple diagnostic observations could investigators determine the true source was the database system, later confirmed by Slack.

    X: March 10

    • Duration: Several hours with various service downtimes
    • Symptoms: The platform appeared “down,” with users experiencing connection failures similar to a distributed denial-of-service (DDoS) attack.
    • Cause: Network failures with significant packet loss and connection errors at the TCP signaling phase occurred. “Connection errors typically indicate a deeper problem at the network layer,” according to ThousandEyes.
    • Takeaways: ThousandEyes detected traffic being dropped before sessions could be established. But there were no visible BGP route changes, which would typically occur during DDoS mitigation. “It was a network-level failure, but not what it may have first appeared,” ThousandEyes noted.

    Zoom: April 16

    • Duration: Approximately two hours
    • Symptoms: All Zoom services were unavailable globally.
    • Cause: Zoom’s name server (NS) records disappeared from the top-level domain (TLD) nameservers, making the service unreachable despite healthy infrastructure.
    • Takeaways: “Although the servers themselves were healthy throughout and were answering correctly when queried directly, the DNS resolvers couldn’t find them because of the missing records,” ThousandEyes reported. The incident highlights how failures above an organization’s Domain Name System (DNS) layer can completely knock out services.
    • Duration: More than two hours
    • Symptoms: The application’s front-end loaded normally, but tracks and videos would not play properly.
    • Cause: Backend service issues while network connectivity, DNS, and CDN “all looked healthy.”
    • Takeaways: “The vital signs were all good: connectivity, DNS, and CDN all looked healthy,” according to ThousandEyes, which added that the incident illustrated how “server-side failures can quietly cripple core functionality while giving the appearance that everything is working normally.”

    Google Cloud: June 12

    • Duration: More than two and a half hours
    • Symptoms: Users couldn’t use Google to authenticate on third-party apps such as Spotify and Fitbit; knock-on consequences impacted Cloudflare services and downstream applications.
    • Cause: An invalid automated update disrupted the company’s identity and access management (IAM) system.
    • Takeaways: “What you had was a three-tier cascade: Google’s failure led to Cloudflare problems, which affected downstream applications relying on Cloudflare,” ThousandEyes explained, adding that the incident is a “reminder to trace a fault all the way back to source.”
    • Duration: More than one hour
    • Symptoms: Traffic couldn’t reach numerous websites and apps that rely on Cloudflare’s 1.1.1.1 DNS resolver.
    • Cause: A configuration error introduced weeks before was triggered by an unrelated change, prompting Cloudflare’s BGP route announcements to vanish from the global internet routing table.
    • Takeaways: “With no valid routes, traffic couldn’t reach Cloudflare’s 1.1.1.1 DNS resolver,” ThousandEyes reported, adding that the incident highlights “how flaws in configuration updates don’t always trigger an immediate crisis, instead storing up problems for later.”
    • Duration: More than two hours
    • Symptoms: The company’s mobile app, website, and ATM machines all went down and failed simultaneously.
    • Cause: A shared backend dependency failed, affecting all customer touchpoints, ThousandEyes estimated.
    • Takeaways: “The fact that three different channels with three different frontend technologies failed all at once eliminates app or UI issues,” ThousandEyes noted, explaining that this incident demonstrated “how a single failure can instantly disable every customer touchpoint—and why it’s vital to check all signals before reaching for remedies.”
    • Duration: Both incidents lasted several hours
    • Symptoms: The first outage affected EMEA region users with slowdowns and failures; the second impacted users worldwide with HTTP 503 errors and connection timeouts.
    • Cause: The October 9 incident was caused by software defects that crashed edge sites in the EMEA region; the October 29 outage was triggered by a configuration change
    • Takeaways: “Together, these two outages illustrate an important distinction: infrastructure failures tend to be regional with only certain customers affected, whereas configuration errors typically hit all regions simultaneously,” according to ThousandEyes.
    • Duration: More than 15 hours for some customers
    • Symptoms: Long, global service disruptions affected major customers, including Slack, Atlassian, and Snapchat.
    • Cause: Failure in the US-EAST-1 region, but global services such as IAM and DynamoDB Global Tables depended on that regional endpoint, meaning the outage propagated worldwide.
    • Takeaways: “The incident highlights how a failure in a single, centralized service can ripple outwards through dependency chains that aren’t always obvious from architecture diagrams,” ThousandEyes noted.
    • Duration: Several hours of intermittent, global instability
    • Symptoms: Intermittent service disruptions rather than a complete outage
    • Cause: A bad configuration file in Cloudflare’s Bot Management system exceeded a hard-coded limit, causing proxies to fail as they loaded the oversized file on staggered five-minute cycles.
    • Takeaways: “Because the proxies refreshed configurations on staggered five-minute cycles, we didn’t see a lights-on/lights-off outage, but intermittent, global instability,” ThousandEyes reported, noting that the incident revealed how distributed edge combined with staggered updates can create intermittent issues.

    Lessons learned in 2025

    ThousandEyes highlighted several takeaways for network operations teams looking to improve their resilience in 2026:

    Investigate single symptoms as they can be misleading. The true cause of disruption can emerge from combinations of signals. “If the network seems healthy but users are experiencing issues, the problem might be in the backend,” according to ThousandEyes. “Simultaneous failures across channels point to shared dependencies, while intermittent failures could indicate rollout or edge problems.”

    Focus on rapid detection and response. The complexity of modern systems means it’s unrealistic to prevent every possible issue through testing alone. “Instead, focus on building rapid detection and response capabilities, using techniques such as staged rollouts and clear communication with stakeholders,” ThousandEyes stated.

    application failures Network outages Top
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email
    Previous ArticleWindows 11 KB5074105 update fixes boot, sign-in, and activation issues
    Next Article I’d rather have a bag full of gadgets than one $2,900 Galaxy Z TriFold
    admin
    • Website

    Related Posts

    Nvidia partners with optics technology vendors Lumentum and Coherent to enhance AI infrastructure

    March 3, 2026

    Intel aims advanced Xeon 6+ at AI edge computing

    March 3, 2026

    Nvidia partners with telecom providers for open 6G networks

    March 2, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Search Blog
    About
    About

    At WifiPortal.tech, we share simple, easy-to-follow guides on cybersecurity, online privacy, and digital opportunities. Our goal is to help everyday users browse safely, protect personal data, and explore smart ways to earn online. Whether you’re new to the digital world or looking to strengthen your online knowledge, our content is here to keep you informed and secure.

    Trending Blogs

    Meta introduces click and engage-through attribution updates

    March 3, 2026

    How to Prevent Your Smartwatch Band From Irritating Your Skin

    March 3, 2026

    Quantum-Resistant Data Diode Secures Data on Edge Devices

    March 3, 2026

    I ditched my gas generator for battery backup, and I’m never looking back

    March 3, 2026
    Categories
    • Blogging (32)
    • Cybersecurity (577)
    • Privacy & Online Earning (81)
    • SEO & Digital Marketing (361)
    • Tech Tools & Mobile / Apps (714)
    • WiFi / Internet & Networking (103)

    Subscribe to Updates

    Stay updated with the latest tips on cybersecurity, online privacy, and digital opportunities straight to your inbox.

    WifiPortal.tech is a blogging platform focused on cybersecurity, online privacy, and digital opportunities. We share easy-to-follow guides, tips, and resources to help you stay safe online and explore new ways of working in the digital world.

    Our Picks

    Meta introduces click and engage-through attribution updates

    March 3, 2026

    How to Prevent Your Smartwatch Band From Irritating Your Skin

    March 3, 2026

    Quantum-Resistant Data Diode Secures Data on Edge Devices

    March 3, 2026
    Most Popular
    • Meta introduces click and engage-through attribution updates
    • How to Prevent Your Smartwatch Band From Irritating Your Skin
    • Quantum-Resistant Data Diode Secures Data on Edge Devices
    • I ditched my gas generator for battery backup, and I’m never looking back
    • AI Agents: The Next Wave Identity Dark Matter
    • 9 Best Rewards Checking Accounts of March 2026
    • 5x the Pages, 70x the Citations, 1615x the Traffic
    • I stopped using my Kindle after setting up this gorgeous self-hosted book server
    © 2026 WifiPortal.tech. Designed by WifiPortal.tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms and Conditions
    • Disclaimer

    Type above and press Enter to search. Press Esc to cancel.