Big data has become a reality in the Middle East. But it is not the same reality for every company, or every user. The explosion of data is creating different problems and opportunities. The medical provider required to store scanned images for each patient’s lifetime faces a very different challenge to the FMCG brand now offered an unprecedented depth of customer purchasing behaviour data. The end user despairing over the time taken to locate a file or email has a different set of challenges to the legal team struggling with new, big data inspired compliance demands.
According to Gartner (News - Alert), a recent survey of 720 companies asked about their plans to invest in big data gathering and analysis revealed that almost two-thirds are funding projects or plan to this year, with media/communications and banking firms leading the way. The research firm insists 2013 is the year of experimentation and early deployment for big data. Adoption is still at the early stages with less than 8 percent of all respondents indicating their organization has deployed big data solutions. 20 percent are piloting and experimenting, 18 percent are developing a strategy, 19 percent are knowledge gathering, while the remainder has no plans or don’t know.
This is, therefore, a critical phase in the big data evolution. While storage costs have come down in recent years, organizations cannot possibly take a ‘store everything’ approach to big data and hope to realize the full long term benefit. The issue is not only what data to retain and where but how to extract value from that data – not just now but in the future as big data technologies, including analytics, become increasingly sophisticated.
In addition to the huge expansion in data volumes, organizations also now have access to new content types. While this depth of data offers exciting opportunities to gain commercial value, it also creates significant management challenges. How should the business protect, organize and access this diverse yet critical information that increasingly includes not only emails and documents but also rich media files and huge repositories of transaction level data?
At the heart of a successful big data strategy is the ability to manage the diverse retention and access requirements associated with both different data sources and end user groups. While today a large portion of the data in a typical enterprise does not get regularly accessed for a year or more, this is definitely set to increase as big data strategies evolve. Many organizations are gleefully embarking upon a ‘collect everything’ policy on the basis that storage is cheap and the data will have long-term value.
Certainly inexpensive cloud-based storage is enabling big data strategies. But the reality is that while it is feasible to store all the data in the cloud, even with fast connections retrieving that 5Tb of data from the cloud back into the organisation would take an unfeasibly long time. Furthermore, cloud costs are increasing, especially as organisations add more data; and even cheaper outsourced tape backup options still incur escalating power and IT management costs.
In addition, the impact of unused data sitting on primary storage extends far beyond higher backup costs; time consuming end user access leads to operational inefficiency and raises the risk of non-compliance.
Organizations cannot take a short term approach to managing the volumes of big data and hope to realize long term benefits. There is a clear need to take a far more intelligent approach to how, where and what data is stored. Is it really practical to take a backup of an entire file server simply because some of the documents need to be retained for several years to meet compliance requirements? Or is there a better way that extracts the relevant information and stores that in a cheaper location, such as the cloud?
To retain information and avoid a cataclysmic explosion in data volumes, organizations need to take a far more strategic approach to data archive and backup. What information must be kept on expensive local storage, and what can be sent to the cloud or another location? And what policies will be put in place to take data ownership away from end user control? By taking a strategic approach to archiving data, based on the property of each data object, organisations avoid the problems caused by end users applying their own ‘retain everything’ policies.
By deleting the local data source and moving it to a virtual data repository, an organisation avoids duplication and inconsistency whilst still ensuring information can be retrieved in a timely and simple fashion. Policy driven rules for data retention can be based on criteria such as file name, type user, keyword, tagging or Exchange classifications, while tiering can be applied based on content rules to any target, including tape or cloud.
This intelligent retention model needs to be backed up by effective data retrieval. Key to this process is context indexing that enables end users to apply simple key word search to access any data. Organisations have the option to context index either live data or secondary data, in backup or archive. In both cases, rather than context index the entire data resource, by applying the right filters and policies organisations can also prioritise the most valuable and frequently accessed data sources. Context indexing critical corporate data in this way ensures the business always has the option to rapidly access and retrieve information.
Combining intelligent storage policies with content indexing reduces data volumes, enables organisations to use the most appropriate storage media for each data object and facilitates rapid access to business critical information.
And it will be demands from individuals to explore and exploit big data that will put growing pressure on IT to deliver more than additional storage resources. What happens when it takes the CEO over 15 minutes to find and access an essential document? Or when the legal team cannot retrieve vital information to prove compliance? Or when the brand manager cannot exploit expensive retailer data and analytics investment to understand customer behaviour?
The key to transforming big data into big intelligence is content and context. By managing big data retention and storage based on content and its inherent value to the business, organisations will be well placed to harness this data not only to address immediate problems but also to improve strategic insight. From predicting demand for new products and services to transforming the speed with which every end user can retrieve corporate documents, it is those organisations that consider retention strategies from day one that will be best placed to realise the big data vision.
About the Author:
Allen Mitchell is the Regional Pre Sales Manager at Commvault MEA. He is an experienced Senior Pre-Sales Consultant with a proven track record built over 20 years in the industry. His experience includes +14 years focused on Data and Information Management issues and solutions for enterprise business.