Data is the life blood of a successful organization, and the effective management of data resources plays a vital role in its smooth operation. The ever growing number of processes and regulations results in the accumulation of large amounts of both business- and non business-related content.
According to a survey by Gartner (News - Alert), 47 percent of large enterprises identify data growth as the biggest data center hardware infrastructure challenge. On average, the data capacity in enterprises is growing at 40 percent to 60 percent.
Research further shows that more than 52 percent of an organizations digital content is unstructured data such as files, documents, image files, video, etc. – while just 31 percent is structured.
Over 70 percent of this content is generated by end users within the organization. Employees often store personal data on company resources as they know that it’ll be securely maintained and regularly backed up.
Thus, the data pool contains a mixture of data which is business-critical as well as data, which has less or no business value. Even business-related data can get stale over a period of time, becoming inactive and showing low business relevance.
The failure to analyze data means that it is all treated in the same manner, yielding the ineffective utilization of company resources.
The real challenge organizations face lies not in having to deal with data growth – which is inevitable – but in the effective and strategic management of this information. After all, while data growth is projected at 40-60 percent per year, growth in IT budgets is estimated to be just 2.6 percent, which is significantly less.
Factors Contributing to Unnecessary Data Growth
Long-term retention is a factor that complicates the overall data management process. Retention may be for business reasons, historical reasons, end-user driven requirements – and policies and regulations that may be prescribed by the government or the organization itself.
As the number of retention policies – both government and home-grown – add up, the organization and storage of data becomes more complex.
Maintaining multiple copies of the same data is both inefficient and expensive. Apart from causing inconsistency and placing a large overhead, redundancy can affect long-term processes such as backup. Although the cost of storage devices is reducing, having redundant data on these devices increases the time taken for backup. This causes a significant increase in the network overhead and bandwidth requirement.
Furthermore, most large organizations with multiple locations globally generate large volumes of data on a constant basis. Due to this global dispersion, backup windows are constantly reducing, and so only critical and business-relevant data should be identified and selected for regular backup.
So what techniques do organizations employ to reduce their data storage requirements and effectively utilize resources?
Resource Acquisition: The Quick Fix
The most common, tactical reaction to solving the data growth problem is to simply buy more storage. Given the reducing cost of storage, this knee-jerk reaction proves to be the quick-fix, but often reflects the lack of the ability to carry out predictive capacity planning.
The hoarding of data is further complicated by the infinite retention policies as the data is stored without consideration for the actual content.
Data Archiving, along with data tiering, is considered an effective data reduction technology. But blind archiving, without first gaining insight into the data landscape or applying any governing policy, simply translates to moving data between the tiers and does not contribute to any reduction in the total volume of data being managed.
DeDuplication: Beating the Bloat
Finally, DeDuplication, or dedupe, is probably the most talked-about data management strategy. It is also perhaps the leading data reduction technology permitting sizable reductions in data volume. Traditionally, organizations opt for a hardware-based approach to dedupe, eliminating redundant data on backend devices.
But the challenges facing this methodology include increases in operational management costs, and impacts on network overhead and bandwidth – both of which contribute significantly to the yearly increase of storage management costs.
Applying these data management strategies individually and independently won't permit efficient capacity planning, which keeps capacity ahead of demand. Neither will they bring about reduction in data volume or operation expenses.
So if the three most widely used strategies fall short, what really is the best solution?
Basics of Integrated Data Reduction
The Integrated Data Reduction approach is the hot topic in the data management world. By applying a combination of the three strategies, organizations can reduce their overall data volume and migrate the retained data to the most appropriate tier of storage, thereby achieving significant reductions in storage costs.
An Integrated Data Reduction approach is implemented in the following manner:
Deduplication is really the key to an Integrated Data Reduction strategy as it reduces redundancy in the backup and archive pools regardless of the backend storage devices used. While deduplication ratios of 1:20 and higher are not unusual, even a conservative ratio of 1:5 would result in a drastic reduction of the operation management expenses.
The disproportionality between data and budget growth is set to increase, and companies are starting to realize that addressing the problem in an ad-hoc manner is ineffective, carrying severe long-term implications.
When properly implemented, an Integrated Data Management strategy can dramatically reduce inefficiency, enhance manageability and drastically reduce operational expenses.