Every entrepreneur running an online business should know what system availability is. Without this knowledge, when a failure occurs, the business does not function and, consequently, does not earn money. As you know, failures happen to everyone. But can they be avoided? If so, what will be the cost? If not, how to deal with them? This is precisely what this post is about. Read it and check how your websites cope with unavailability.
Availability is a key measure determining how long a system, network, or service can operate without disruptions. Network availability is often referred to as the degree of resilience to failures.
The higher the availability, the lower the risk of failures or service interruptions. Often, a day is used as the time unit, and the resulting percentage value of availability indicates how many hours the system operated without disruptions. For example, if the server’s availability was 50% during the day, it means that it operated smoothly for 12 hours and was unavailable for the remaining 12 hours. In the context of information systems, a classification of availability called “availability class” is also applied.
Unavailability | ||||||
---|---|---|---|---|---|---|
Availability | Class | Type | Annually | Monthly | Weekly | Daily |
90% | 1 | Unmanaged | 36d 12g 34m 55,2s | 3d 1g 2m 54,6s | 16g 48m 0,0s | 2g 24m 0,0s |
99% | 2 | Managed | 3d 15g 39m 29,5s | 7g 18m 17,5s | 1g 40m 48,0s | 14m 24,0s |
99,9% | 3 | Well Managed | 8g 45m 57,0s | 43m 49,7s | 10m 4,8s | 1m 26,4s |
99,99% | 4 | Fault Tolerant | 52m 35,7s | 4m 23,0s | 1m 0,5s | 8,6s |
99,999% | 5 | High-Availability | 5m 15,6s | 26,3s | 6,0s | 0,9s |
99,9999% | 6 | Very-High-Availability | 31,6s | 2,6s | 0,6s | 0,1s |
99,99999% | 7 | Ultra-Availability | 3,2s | 0,3s | 0,1s | 0,0s |
Availability is measured in percentages, with higher percentages indicating better business performance. In the table above, I described the most common ranges, classes, and corresponding downtime resulting from service interruptions. The “class” criterion refers to the number of nines (digits) in the availability. A higher class signifies a higher level of availability.
Regular hosting guarantees availability within the range of 90% – 99.9% over the course of a year. It’s different for cloud computing, dedicated servers, or traditional server rooms. In these cases, part of the availability depends on the solutions we implement to enhance their operational continuity, while another part relies on the service provider ensuring the infrastructure.
It’s worth noting that unavailability also encompasses the time it takes for a website or online store to load for more than 20 seconds or when a critical process for your business doesn’t function. Every e-business should define what availability means to them and what its absence implies. For large-scale stores, a few minutes of unavailability throughout the year can result in significant losses, whereas for small businesses, it may not be noticeable. Similarly, a slow-performing website can be a critical issue for certain companies.
The rule of unavailability is a principle that assumes that at some point, a service or application will be unavailable to end-users. This means that when designing and implementing systems, one must anticipate that failures or technical issues may occur, impacting the availability of the service or application.
The rule of unavailability is an important aspect of system design. It allows for the creation of more resilient systems that can withstand failures. Additionally, it contributes to the faster detection and more effective resolution of issues.
In practice, the rule of unavailability should be interpreted as designing systems to be resistant to technical problems, such as server failures or errors caused by improper commands. Regular testing and monitoring of the system should also be ensured to detect problems early and prevent more significant outages.
Service Level Agreement (SLA) is a contract that defines the conditions and level of services provided by a supplier to a customer. One of the key elements of an SLA is the definition of the service’s availability level, indicating the percentage of time the service will operate without disruptions. The agreement may also include provisions for contractual penalties that the provider must pay in case the required availability is not met. Examples of services for which SLAs are commonly applied are cloud services provided by AWS, Google Cloud, or Azure. Qlos is an official AWS partner, so I will provide you with an outline of their SLA for reference.
AWS Services | SLA | Compensation | Size of compensation |
---|---|---|---|
EC2 | 99,99% | Less than 99.99%, but greater than or equal to 99.0% | 10% |
Less than 99.0%, but greater than or equal to 95.0% | 30% | ||
Less than 95.0% | 100% | ||
S3 | 99,9% | Less than 99.9%, but greater than or equal to 99.0% | 10% |
Less than 99.0%, but greater than or equal to 95.0% | 25% | ||
Less than 95.0% | 100% |
It is worth noting that the AWS SLA pertains to the services themselves, not the virtual machines. This means that in the event of a virtual machine failure, you still have access to the service, which allows you to launch a backup server. This is a tremendous advantage of operating in the cloud computing environment. With a single server, we cannot avoid failures in any way.
They say the customer is our master. In today’s digital age, where sales have shifted to the online realm, we should take those words even more seriously. By ensuring the highest level of service availability, we can be certain that the customer will never bounce off our proverbial doors. However, this comes with higher costs. Below, I have prepared a comparative cost estimate for a server purchased in AWS versus one from a publicly available hosting service provider, which would maintain the service’s availability at the highest level.
Our goal is to maintain an e-commerce store based on redundant servers in every layer with a 99.99% availability level over the course of a year. In a traditional approach, this would require six dedicated servers (2 Load Balancer servers, 2 web servers, and 2 database servers). Building a cloud infrastructure would require significantly fewer resources. This is because cloud computing provides a higher level of availability regardless of the number of connected servers. AWS ELB (Elastic Load Balancing) operates continuously and does not require adding additional resources, as is the case with a Load Balancer in a traditional solution. The two diagrams below represent our needs.
For confirmation, the SLA of the models in question looks as follows:
SLA of the AWS solution | SLA =0,9999* (1 – (1 – 0,9999)2) = 99,99% w skali miesięcznej |
SLA of a classic solution | SLA = (1 – (1 – 0,995)2) * (1 – (1 – 0,995)2) * (1 – (1 – 0,995)2) = 99,99% w skali roku |
It is worth emphasizing that availability on a monthly and annual basis differs. With a 99.99% level of availability, the annual unavailability would amount to 52m 9.8s, while on a monthly scale, it would be 4m 21s. When summing up the monthly time, it equals the same value as the annual unavailability. HOWEVER, it is more favorable for the service recipient to have no access for 4 minutes in a month rather than 52 minutes in a single day (The annual perspective does not indicate the actual occurrence. It could be short one-time interruptions or an hour-long outage on a specific day).
Additionally, in AWS cloud, there is an option to receive service credits in case of a lower SLA than expected. You can find more information about EC2 services here.
AWS services are flexible, scalable, and provide significantly higher availability (values are listed in the graphics). They do not require provisioning multiple machines to accommodate all the data. Utilizing AWS S3 service is enough to store the data. Therefore, there is no need to purchase high-capacity servers (EC2), resulting in lower maintenance costs.
AWS ELB service has a 99.99% availability on a monthly scale, ensuring that our store will always be operational. The service itself allows for automatic switching between servers in case of any needs, issues, or outages. This is a significant advantage. When acquiring a dedicated server, we have to wait for its deployment, and additionally, the resources won’t transfer automatically. The process requires involving additional specialists.
Now, let’s move on to the costs. How does it translate in relation to the above diagram?
Services we need:
All services are located in Europe – London.
In order to generate lower costs, I adopted annual billing for the EC2 service. This reduces costs by 37%. For the other services, payments are monthly.
Total monthly costs:: 132,25$ ~533,41 zł
Annual total costs: 533,41*12 =6400,92 zł
The calculations were made based on the AWS calculator. You can check them here: go to calculations.
We need:
Average server cost: 499 PLN net
Monthly total cost of servers: (499*6) = 2994 PLN net
Annual total cost: 2994*12 = 35,928 PLN
It is often said that the cloud is expensive, but it turns out that the costs in the AWS cloud environment are significantly lower, nearly 83%! On an annual scale, you can save approximately 29,527 PLN. Regardless of the size of your business, you can allocate this value to a high-quality advertising campaign that will bring you much more benefits.
It is also important to consider that the calculations do not take into account the costs of specialized server management or network availability.
Another option that I have encountered many times is purchasing a single server and operating solely on it. The costs are lower, indeed. However, in that case, you cannot rely on scalability or flexibility. Additionally, operating on just one server reduces your availability, and in the event of a failure without backup copies elsewhere, you expose yourself to the risk of losing your entire business.
Also, remember that your business and performance requirements play a crucial role in the entire process. You can determine and assess them in the following paragraph.
I often hear complaints about service outages and unavailability online. For each of us, it is important to have our website up and running for as long as possible, ideally without any interruptions. However, it is essential to remember that the higher the availability, the higher the associated costs.
To begin with, it is valuable to conduct a risk analysis related to the unavailability of a particular service. Failures often occur at the most unexpected times, such as during peak sales periods. Even if the website has been running smoothly for the past few months, there is no guarantee that it will continue to do so in the future. This is primarily due to a lack of testing under high load conditions.
Therefore, every business owner must ask themselves how long a service can be unavailable without negatively impacting their operations. For example, a few hours of downtime may not affect the financial results of an e-commerce store selling a single tool. However, for a multi-category store that generates hundreds of thousands of revenue in an hour, every minute is valuable.
Once we have determined an acceptable level of unavailability, we can choose a suitable service that meets our requirements. However, it is important to remember that high availability comes with higher costs. Sometimes, it is worth taking on the risk and accepting lower availability.
To calculate website unavailability, you need to gather data on the time when the website was inaccessible or experienced operational issues. There are several ways to obtain this data:
The formula for calculating website unavailability is straightforward:
Website Unavailability = (Downtime / Total Monitoring Time) * 100%
As mentioned earlier, it is a percentage value. Consider only the total monitoring time to avoid distorting the result with values outside the analysis period.
Remember that several factors can contribute to website unavailability, such as server problems, DDoS attacks, network issues, code errors, and more. Therefore, regular monitoring is essential to identify problems quickly and take appropriate corrective actions.
It is important to remember that high availability is the foundation for the operation of most services, but it is not essential for every business. Undoubtedly, it is a crucial factor for online businesses operating 24/7.
However, there are no foolproof systems. Whether you have one server or a dozen, downtime can unexpectedly occur despite your precautions, such as power outages in data centers. Cloud computing offers greater protection, but it may not always be the best solution considering the size of your business. If you are unsure about the best approach for your company, feel free to reach out to us. We can provide guidance on the most suitable and cost-effective solution.