infoTECH Feature

June 11, 2010

Google Reviews Recent Outage in Datastore for Apps Engine

Approximately 2 percent of applications on Google's (News - Alert) Apps Engine were affected by a May 25, 2010 outage in the datastore - which is part of the platform that Google Apps Engine provides to developers.

According to a recent review by Google, the Apps Engine datastore outage led to applications not being able to write data. Applications were still able to read data.

There are over 100,000 applications on Apps Engine. About 2 percent of all Apps Engine applications that day actually had instances of unapplied writes, and of those, the vast majority of applications had 1-2 unapplied writes, Google said.

The outage affected all Apps Engine applications using the datastore storage service.

The outage lasted 50 minutes while residual high latency lingered for an additional two hours.

The unapplied writes did not corrupt application data nor did they impact the transactional consistency of application data.

Google emailed administrators of all affected applications to let them know they should take action. If they did not receive an email, there was no action to take.

If the application had unapplied writes, all of the data has been recovered and reinserted into the application's datastore as separately labeled entities. Google has developed new tools in the datastore to re-integrate unapplied writes and has provided a support email address [email protected] to work one-on-one with the Apps Engine team.

According to Google, unapplied writes were accepted by the primary datastore just before the outage but were not replicated to the backup datastore before the outage occurred.

This data was not lost, but was not available on the secondary datastore when applications began using that as their new datastore after the primary datastore failed.

The datastore relies on Bigtable to store data and one of the components of the Bigtable is a repository for determining where a specific entity is located in the distributed system. Due to instability in the cluster, this component became overloaded. This made reads and writes time out.

By default, Apps Engine waited the full 30 seconds to complete a datastore request. This behavior caused the number of requests waiting to complete to quickly jump beyond the safe limit for the Apps Engine service. This in turn caused all requests to fail, regardless of whether or not they used the datastore.

In other company news, Google offered a preview of a Chrome Web store, introduced Google Apps Engine for Business and executives touted the benefits of HTML5 at this year's recent I/O developer conference.
 

Ed Silverstein is a contributing editor for TMCnet's InfoTech Spotlight. To read more of his articles, please visit his columnist page.

Edited by Patrick Barnard
FOLLOW US

Subscribe to InfoTECH Spotlight eNews

InfoTECH Spotlight eNews delivers the latest news impacting technology in the IT industry each week. Sign up to receive FREE breaking news today!
FREE eNewsletter

infoTECH Whitepapers