A novel idea only a few years ago, today, data deduplication technology has become conventional, a commodity even. For those looking to shrink the backup window and suppress escalating storage costs, deduplication capabilities are a must. Deduplication features can reduce storage footprints and allow a company to store a month’s worth (or more) of data on a single disk. But in reality, not all deduplication technology is created equal and a “panic purchase” will leave you reeling. A thoughtful approach is needed to identify your primary objectives and ideal outcomes envisioned for your secondary storage environment.
Multiple data centers in multiple countries
More than likely if you’re considering a modern disk-based backup strategy, one of your goals is to reduce the amount of data sent to your backup appliances. If you’re also looking to minimize overlap across multiple data centers and multiple countries, data deduplication technology can still help! But, for these capabilities, you need to consider your options carefully.
For the enterprise, data deduplication technology can give you the ability to manage and protect data globally. You may even be able to eliminate the use of tape at remote sites. If your company is preparing for an upcoming merger or global expansion, look for capabilities that support one-to-one and many-to-one replication configurations? this will enable tape infrastructure consolidation at a centralized data site. These systems can aggregate data into a clustered repository of globally unique data. Then you can export data from remote sites to physical tape at the central site.
Tape is NOT dead in my world
We know that data deduplication features reduce storage space required by eliminating redundant data in your backup environment. It also reduces the disk space required to store disk-based backups. If your organization also relies on physical tape libraries (i.e. financial or healthcare companies), your data deduplication solution should provide a method to continue your use of tape for archival purposes. VTL-based data deduplication is one of the least disruptive ways to implement this technology where tape is used. Virtual tape library systems are disk arrays configured to look to the host server and the backup software as if they were physical tape libraries. Data is streamed to and recovered from the VTL as if it were tape.
It’s important to remember that solutions that have deduplication as an add-on feature can sometimes be limited in terms of performance and integration with tape-based systems. If your company has data retention requirements, make sure your data backed up to a VTL also can be backed up to a physical tape for remote storage.
I “checked” the deduplication box
As you know, data deduplication features are anything but uniform. As a result, simply checking the data dedupe box is not sufficient. Post-process, inline and turbo deduplication are often called primary dededuplication categories. The specific benefits and efficiency gains you’re looking for should influence what type of capabilities you adopt.
In post-process deduplication, backup data is written to temporary disk space first. Then deduplicated data is copied to a repository disk. Many argue that post-process deduplication is ideal when speed is an issue because it occurs after the backup process is complete and can be scheduled at the user’s discretion.
With inline deduplication, as data is received into a disk system, software determines if duplicate files already exist. If not, the system writes the data on the target system. It also requires less I/O work. That means when you’re done with dedupe, you’re done. In this scenario data can be replicated sooner, which is particularly useful if you have to replicate data to a disaster recovery site.
Inline deduplication also requires a less complex configuration because there’s no landing zone as required with a post-process dedupe. Another characteristic of inline processing is that there are built in safeguards. This means that should the deduplication process have any problems, the system will automatically switch to post-processing so the backup will not be lost. Turbo deduplication is a combination of inline storage and post-process deduplication. In this case, some processing is done inline and some is done post, depending on the I/O of your host systems.
Most analysts tell us that data volumes are doubling each year. At this pace, data deduplication is fast becoming a compulsory technology. If your primary goals involve having deduplication technology integrate with your current backup policies? including writing to physical tape? and global data consolidation, a systematic look at deduplication options will get you there.
Paul Buelow is a manager at Dynamic Solutions International (DSI). He has 30 years experience in the technology industry and can be reached at p.buelow@DynamicSolutions.com