Availability SLA is an important criterion to enterprise customers when selecting a Cloud service provider. If the service provider doesn’t offer appropriate SLAs, they often don’t even make it to the “short list”.
As an example, Amazon’s EC2 availability SLA is 99.95%. Amazon also offers transparency by providing a website that publishes up-to-the-minute status on their services and any related issues. In the case of AWS, Amazon makes the following exclusions:
Amazon EC2 SLA Exclusions
The Service Commitment does not apply to any unavailability, suspension or termination of Amazon EC2, or any other Amazon EC2 performance issues:
- (i) that result from Service Suspensions described in Section 7.1 of the AWS Agreement;
- (ii) caused by factors outside of our reasonable control, including any force majeure event or Internet access or related problems beyond the demarcation point of Amazon EC2;
- (iii) that result from any actions or inactions of you or any third party;
- (iv) that result from your equipment, software or other technology and/or third party equipment, software or other technology (other than third party equipment within our direct control);
- (v) that result from failures of individual instances not attributable to Region Unavailability; or
- (vi) arising from our suspension and termination of your right to use Amazon EC2 in accordance with the AWS Agreement (collectively, the “Amazon EC2 SLA Exclusions”).
If availability is impacted by factors other than those explicitly listed in this agreement, we may issue a Service Credit considering such factors in our sole discretion.
Here is section 7.1 from AWS agreement:
- “…suspended for the duration of any unanticipated or unscheduled downtime or unavailability of any portion or all of the Services for any reason, including as a result of power outages, system failures or other interruptions…”
- we shall also be entitled, without any liability to you, to suspend access to any portion or all of the Services at any time, on a Service-wide basis:
- (a) for scheduled downtime to permit us to conduct maintenance or make modifications to any Service;
- (b) in the event of a denial of service attack or other attack on the Service or other event that we determine, in our sole discretion, may create a risk to the applicable Service, to you or to any of our other customers if the Service were not suspended; or
(c)in the event that we determine that any Service is prohibited by law or we otherwise determine that it is necessary or prudent to do so for legal or regulatory reasons (collectively, “Service Suspensions”).
…To the extent we are able, we will endeavor to provide you email notice of any Service Suspension in accordance with the notice provisions set forth in Section 15 below and to post updates on the AWS Websites regarding resumption of Services following any such suspension, but shall have no liability for the manner in which we may do so or if we fail to do so.
So, if there is any outage unintended or intended by Amazon, those numbers may not be included in their service availability measurements. In fairness to Amazon, I haven’t experienced much outage or remember any notices for scheduled maintenance. Never-the-less, if it occurs, it won’t be counted as an outage.
You can find EC2's SLA at http://aws.amazon.com/ec2-sla/.
The other day, I received the following notification from GoGrid:
In this case, customers basically had no administration access to their running servers for 4 hours.
GoGrid claims to offers 100% uptime in their SLA, but there are some exclusions as follows:
- downtime during scheduled maintenance or Emergency Maintenance
- outages caused by acts or omissions of Customer, including its applications, equipment, or facilities, or by any use or user of the Service authorized by Customer
- outages caused by hackers, sabotage, viruses, worms, or other third party wrongful actions
- DNS issues outside of GoGrid's control
- outages resulting from Internet anomalies outside of GoGrid's control
- outages resulting from fires, explosions, or force majeure
- outages to the Customer Portal
- failures during a "beta" period
According to item #1, scheduled maintenance is not included as part of their availability SLA.
So, I appreciate the difference between scheduled maintenance and an unexpected outage. However, from an availability perspective, I think scheduled maintenance should be included in the availability measurements and reporting. Otherwise, it is misleading clients.
The Cloud service provider has options to manage for continuous availability. That is under their control. Regardless of scheduled vs. unscheduled service outage, the business impact may be no less even if the subscriber is aware of an outage in advance.
What is your experience with Cloud service providers? Do they explain the SLAs clearly? Do you think scheduled maintenance should be included or excluded?