A relentless explosion of big data plagues many stakeholders, from IT to legal, as they grapple with how best to retain, access, discover and ultimately delete content in compliance with evolving regulations.
The big deal with big data starts with the sheer volume, generated by a growing number of devices, data sources and applications.
According to IDC (News - Alert), the world generated more than one zettabyte (ZB), or one million petabytes (PB), of data in 2010. By 2014, the growth is predicted to reach 72 ZBs a year. The Middle East mirrors the same trend and regional enterprises have seen exponential growth in data over the past two years.
The influx of machine generated data, unstructured data (e.g., images, audio or video files) as well as semi-structured data (e.g., e-mails, logs, etc.) adds a layer of management complexity, especially when determining the most efficient and reliable way to ingest, protect, organize, access, preserve and defensibly delete all this vital information from the broad variety of sources.
It’s not all bad news, though, as all this data can be a huge asset. But without a modern management strategy, it can also be a huge liability. In sifting through voluminous big data to find responsive information, organizations can spend millions of dollars to isolate relevant Electronically Stored Information (ESI (News - Alert)) and even more to review it.
Clearly, exponential data growth, diversity of data types and never-ending demands for optimized retention and discovery will create the perfect storm unless companies steer toward a more holistic approach to managing big data. In doing so, they can begin to view data backups and archives more strategically while leveraging integrated solutions for lowering storage costs and compliance risks.
Most importantly, they must choose to invest in technology that meets the demands of the business with a flexible and adaptable strategy that best accommodates future requirements. Companies can then extract maximum value from all their crucial information in ways that produce valuable business benefits without the limits of technology lock-in.
Crossing Big Data’s Backup and Archive Chasm
For too many organizations, backup and archive functions are deployed and maintained as separate “silos” within an overall information management strategy. This is not smart for a number of reasons. Multiple, disparate hardware and software products typically manage these data silos, which leads to duplicate copies of information that must be protected and preserved with inadequate visibility into what is being maintained.
Compounding the problem is the fact that two distinctly different groups are traditionally responsible for data protection and preservation respectively within most corporate environments. In most organizations, storage and backup administrators oversee data protection and are therefore heavily focused on the impact big data has on backup windows, recovery SLAs and infrastructure costs.
While information management buyers are fixated on how big data affects data retention, discovery and information governance policies, they often operate without regard for the operational impact of these policies.
As a result, a chasm exists between these two critical constituents in ongoing big data conversations. While backup and archive serve different purposes, the functionality is similar: both processes make a copy of original data either for recovery or preservation.
With that said, Gartner (News - Alert), among others, predicts that being able to look at backup and archive holistically promises significant cost reduction and risk management benefits. The convergence of backup and archive is an emerging concept that is gaining traction as organizations seek solutions to reduce the number of copies created for backup and archiving while more closely aligning data access policies for both.
One way to accomplish the industry-wide expectation of ‘doing more with less' is to unify backup and archive. This requires cross-functional teaming and starts with developing a better understanding of how applications, users and critical business processes need to access data throughout its lifecycle. This effort requires collaboration between all stakeholders and those responsible for both recovery and discovery.
This collective group should examine all the different policies and practices used to move, copy, catalog and access data for backups, retention, recovery, discovery and disposition. This will result in many of the hurdles to the streamlined access to individual and corporate data being uncovered.
Another typical outcome of the initial review process is the eye-opening realization that multiple copies of data reside everywhere – on physical and virtual servers, in the cloud, in backup repositories, in legal and IT archives as well as on employees’ desktops and mobile devices scattered throughout the company. While the number of redundant data copies can be reduced effectively and efficiently through deduplication, the biggest benefits come from consolidating data in a single data store that leverages a common hardware and/or software infrastructure for backup and archive.
The notion of such a single data repository that eliminates redundancies and separate silos is compelling on many levels. A holistic approach that captures data once and then repurposes it for data protection and preservation is key to getting the right data into the hands of the right people so they can turn it into something more meaningful and actionable for the business. This approach also aids centralized reporting that enables business and IT leaders to make more informed decisions with their data while bolstering analytical skills.
Moreover, a central place to delete data also reduces both the cost and risk of inadvertently storing multiple copies. Understanding large data pools well enough to extract and collect relevant subsets for both reactive and proactive eDiscovery can prove to be a huge cost and risk reduction exercise.
Most important, companies can maintain a balance between capturing too much data or not enough as both scenarios pose potentially serious business risks.
The benefits of deploying an integrated information management strategy resonate throughout all levels of an organization, including outside of IT, ultimately resulting in better coworker collaboration and sharing of passive content enterprise-wide. Forward-thinking companies which embrace a unified approach for managing both backups and archives will be able to take full advantage of a future-proof solution that elevates overall information management, while providing appropriate access to business-critical information as it ages.
 Gartner, “Does Integrated Backup and Archiving Make Sense?” Dave Russell and Sheila Childs, March 2012