Why Migrate from Btrieve to PostgreSQL and other Relational Databases?
Introduction Many independent software vendors (ISV) and corporate users still rely on applications that use a category of database collective called...
11 min read
Mertech : Jul 6, 2024 12:26:46 PM
This post has been co-authored with Matt Ledger.
One big problem with moving your business to the Cloud: What if the Cloud goes down?
Just like on-site hosting and storage, Cloud hosting can and does fail.
For example, in 2017, Amazon Web Services (AWS), went down for just four hours. Even that relatively brief period of downtime cost companies in the S&P 500 index an estimated $150 million. Losing access to your data means losing productivity, sales, and face. And for some businesses, a four-hour cloud outage can be crippling. However, the benefits of cloud migration are still very real, and the threat of an outage shouldn’t deter you from taking your business there.
While you can’t control when a cloud provider’s systems will go down, you can control your outage preparation. You can create a redundant system that allows you to access mission-critical systems and data regardless of your connection to any one cloud.
In this post, we’ll discuss the reasons behind cloud outages, their associated risks, and a few practical strategies to help you handle them.
A cloud outage occurs when cloud computing services and applications hosted on the Cloud become unavailable or experience disruptions (such as slow response times).
A cloud server outage for your business translates into significant financial losses, operational inefficiencies, reputational damage, and potential legal consequences.
But what are the reasons behind these disruptions?
Did you know that, in 2022, about 80% of companies reported at least one security breach?
Such breaches compromise the integrity of your data and lead to serious service interruptions.
For example, in cyber-attacks such as Distributed Denial of Service (DDoS), hackers can overload your system with Internet traffic, making your service inaccessible to legitimate users. By exploiting hidden vulnerabilities in the security systems, they can even leak sensitive data and completely disrupt the service.
To minimize cloud migration security risks, consider investing in advanced security protocols, real-time monitoring, and threat detection.
Even with stringent protocols and systems in place, a single incorrect command or configuration mistake can bring down an entire IT infrastructure service.
That's because tasks like storage provisioning and new server deployment are usually done through manual configuration processes and the use of command-line interfaces (CLIs), leading to an increased likelihood of error.
The good news?
While certain tasks cannot be fully automated, you can still implement rigorous error-checking policies to minimize the risk of misconfiguration and human error.
Back in the day, accidents like backhoes slicing through cables during network expansions were pretty common, leading to major outages. Nowadays, there are fewer such accidents as better safety measures have helped cut down these oops moments.
On the other hand, natural disasters remain a wild card for data centers. Despite all the tech and planning, hurricanes, floods, earthquakes, and wildfires can still wreak havoc. They can damage power equipment and major data centers.
While your hardware can't be fully protected from extreme weather conditions, having a disaster preparedness plan could minimize the impact of such accidents.
Cloud outages can be triggered by glitches, bugs, and other technical problems. These are more common in enterprise-grade data centers that support organizations of all sizes and industries.
The worst part?
Such issues might stay under the radar or be underestimated until they manifest as service incidents affecting end-users. Remember that sometimes fixing these tech hiccups isn't straightforward or quick, and that's when services can be down for longer periods.
The demand for electricity in data centers is substantial, as they consume 10 to 50 times more electricity per square foot than typical commercial buildings.
Despite efforts to secure abundant electricity sources, cloud providers still struggle with power-related outages (which account for 43% of all data center outages).
Such power and network issues can have far-reaching consequences, so having robust backup power systems and network resilience measures is necessary for cloud service providers.
Modern cloud services often rely on complex dependencies, creating a web of interactions. When a single component experiences an outage, it can trigger a domino effect, disrupting various interconnected services and applications. This means that a seemingly minor issue can quickly escalate into a major outage, affecting multiple aspects of cloud operations.
If a cloud provider's critical infrastructure experiences downtime, this can affect core services like identity and access management, authentication, and authorization. As a result, organizations relying on these services might be unable to perform essential tasks, leading to productivity losses.
When the cloud experiences an outage, it's not merely a minor inconvenience – it can have significant consequences for businesses. Here's a closer look at the risks and repercussions:
Cloud outages can happen to even the most reliable cloud service providers, and when they do, businesses need to be prepared.
Whether it's minimizing downtime, ensuring data recovery, or maintaining business continuity, these approaches will help you navigate the challenges that come with cloud service interruptions:
Creating redundant data access costs money; we know you can’t spend a fortune on data storage.
That’s part of why you wanted to move to the Cloud in the first place, right?
To ensure you back up only essential data and applications, we recommend splitting your systems into:
After you’ve classified your systems, consider how much you can spend on redundancy. You should aim to back up your Mission-Critical systems at least once. If you have more money to work with, you can look at backing up your Nice to Haves or creating extra layers of redundancy for your Mission-Critical systems.
The good news is that data storage is cheaper than ever, especially in the Cloud. You can use the ubiquity of Cloud-based storage to your advantage, creating a multi-cloud infrastructure that protects you in the event one cloud service provider goes down.
The most efficient way to protect yourself from cloud outages is to store your data in more than one cloud service. This cloud migration strategy, called multi-cloud, assumes it’s unlikely multiple cloud providers will fail at once.
So, when one provider goes down, you can switch the load and traffic to another cloud service containing the same data, reducing or eliminating downtime. However, remember that there are a couple of kinks in the multi-cloud strategy:
In addition to protecting you from cloud outages, multi-cloud architecture protects you if any one cloud provider goes out of business completely. Different clouds are better suited for different processes, allowing you to optimize access to your system.
However, extra cloud storage isn’t enough for some critically important data. After all, what happens if your Internet connection itself goes down? We recommend storing this data on-site but connecting it to the Cloud through hybrid cloud migration.
Local data storage is still a good option to create redundancy. It allows you to keep a copy of mission-critical data and systems, which you can rely on no matter whether your cloud services go down.
However, in our increasingly connected world, you’ll want to make sure your local data is accessible over the Internet and through the Cloud as well. This strategy, known as the hybrid cloud, protects you from cloud outages by allowing you to access and update your data locally during an outage and then push those updates out to the Cloud after service resumes.
The hybrid cloud strategy requires you to make your local data easily accessible and retain enough local storage to back up your mission-critical systems and data.
What you get in return is a system that’s completely cloud-outage-proof. The entire Internet could go down, but you’ll still have access to your essential data and systems, so you can keep working in-house while you wait for your cloud providers to come back up.
To enhance your resilience against cloud outages, consider opting for a higher Service Level Agreement (SLA) with your cloud provider.
SLAs define the availability and uptime guaranteed by the provider, and selecting a more robust SLA can significantly minimize the impact of outages. For instance, an SLA guaranteeing 99.999% uptime allows for only 5.25 minutes of downtime per year.
While higher availability SLAs may come at a premium, they prioritize the continuity of your services. Here are a few tips:
Most cloud migration best practices involve implementing robust backup and recovery strategies. These strategies are vital for ensuring business continuity and minimizing downtime. Here's how to do it:
Cloud outages are a common challenge organizations worldwide face. Let's explore some real-life examples and statistics to understand the impact of these incidents and the lessons learned from them.
In February 2023, Oracle Cloud Infrastructure faced a major outage.
The issue stemmed from a faulty update to the cloud's DNS configuration. This affected Oracle's Ashburn data center, disrupting services for several hours for both Oracle's internal operations and its customers worldwide.
In June 2023, AWS experienced an outage, which affected a wide range of services and websites, including the New York Metropolitan Transportation Authority and the Boston Globe. The issue was related to a subsystem responsible for the capacity management of AWS Lambda, a serverless computing service.
In June 2022, Cloudflare experienced an unplanned outage lasting an hour and a half. The outage affected popular sites like Discord, Shopify, Fitbit, and Peloton. It resulted from a network configuration change in 19 of Cloudflare's data centers.
One of the largest Atlassian outages occurred in April 2022 and lasted almost two weeks for some users. The outage resulted from some cloud infrastructure issues and poor communication, showing just how important a solid plan and clear updates are during such tech hiccups.
Apple's iCloud suffered a four-hour outage in March 2022, affecting major services such as the App Store, Apple Maps, and Apple TV. The outage was attributed to a problem related to the company's DNS. Corporate and retail systems were also affected.
In February 2022, Slack experienced a five-hour outage of its AWS cloud resources, impacting over 11,000 users.
Users could not send messages, upload files, join channels, or use the desktop app. The root cause was a configuration change, and users were advised to restart the app and clear their cache upon recovery.
IBM encountered two separate outages in January 2022.
The first one disrupted cloud services in the Dallas region for over five hours. While the in-house team resolved the problem, they inadvertently caused an hour-long second outage with virtual private cloud services, affecting users globally.
In December 2021, AWS experienced a significant outage that affected various services, including API Gateway, Fargate, EventBridge, and EC2 instances.
The outage, which lasted for nearly 11 hours, disrupted businesses and services across the globe. It was caused by an automated system error in AWS's "us-east-1" region, leading to network congestion resembling a DDoS attack.
Google Cloud suffered a two-hour outage in November 2021, impacting popular services like Home Depot, Snapchat, Etsy, Discord, and Spotify.
The root cause was identified as a network configuration glitch affecting load balancing. Users encountered 404 errors while accessing affected websites during the outage.
In October 2021, Microsoft Azure experienced a six-hour disruption, affecting virtual machine services. Users faced difficulties deploying new VMs or updating extensions, and basic service management operations resulted in errors.
The outage resulted from a software-based issue during a VM architecture migration.
All these instances of real-life cloud outages have taught us valuable lessons in maintaining the reliability and resilience of digital services. Here are the key takeaways:
From Desktop to Cloud: Freight Management Systems (FMS) Case Study
In the world of cloud computing, FMS stands out as a prime example of a successful transition to an optimal cloud infrastructure, minimizing the risk of cloud outages.
To modernize its transportation management software from Windows desktop to cloud-based SaaS, FMS partnered with Mertech - an expert in application modernization. Using Mertech's Thriftly platform, FMS efficiently shifted to a web-based model, adapting to industry changes and offering enhanced customer experiences.
Regardless of how you do it, preparing for cloud outages is extremely important by creating a redundant data access system that won’t go down if any cloud provider does. It will allow you to fully reap the financial benefits of moving to the cloud, but also preserve your customer base.
If you want to migrate your legacy system to the Cloud, don't hesitate to reach out and learn more about our cloud migration services and support.
Cloud outages can be:
While no one's perfect, some cloud providers have better track records than others. Providers like AWS, Google Cloud, and Microsoft Azure have invested heavily in infrastructure and redundancy, making them more resilient to outages.
But remember, it also depends on how you configure and manage your cloud services.
Statistically, public cloud outages tend to make more headlines, but it's not about commonality – it's about control.
With a private cloud, you have more control over your environment, reducing the risk of outages caused by other users. In contrast, public clouds are shared spaces, meaning that while outages may appear more prevalent, they often impact multiple users simultaneously.
Vulnerabilities can be sneaky, but you can catch them by monitoring performance metrics and setting up alerts. Keep an eye on resource utilization, network traffic, and response times. If something starts acting fishy, your monitoring tools will give you a heads-up.
Cloud monitoring tools keep track of your cloud resources 24/7. Examples like AWS CloudWatch, Google Cloud Monitoring, and Azure Monitor provide real-time insights into your cloud environment.
They track performance, detect anomalies, and can even automate responses to keep your cloud running smoothly.
Introduction Many independent software vendors (ISV) and corporate users still rely on applications that use a category of database collective called...
COBOL applications are the foundation of numerous essential business functions, especially within the banking, insurance, and government sectors....
Imagine breaking free from the constraints of old, monolithic systems and embracing the agility and innovation of cloud-based solutions.