If you’ve paid any attention whatsoever to the storage industry in the last few years, the word deduplication has come up. A lot. Certain specialized storage arrays, like backup repositories, have been doing it a while, and certain software has been doing it a while (like backup software), but it wasn’t until the last couple years that it really went mainstream.
For starters, what is it? Well, deduplication is the process of eliminating duplicate data. That means that if there are three files that are all identical except for a few kilobytes, those files can be stored once with a lookup table pointing each back to the same data on disk, with only the few kilobytes of differences having to be stored multiple times. This is great if you have a lot of duplicated data — you can really save space.
Microsoft isn’t one to miss out on an opportunity, and so with Windows Server 2012 they implemented a deduplication functionality directly into the OS. This meant that you could turn on deduplication on a file share and save a lot of space … kind of …. There were a lot of long-term restrictions that people started running into when they ran Microsoft deduplication for long periods of time, and if not set up properly could result in serious issues later. As is normally the case with Microsoft, version one was okay, version two was much better, and version three seems to finally be good.
Improvements to Windows Server 2016 Dedup
Support for Large Volumes
There was technically not a limit to the size of the volume that you could dedup in Windows Server 2012. However, the dedup operation was single threaded, which meant that the rule of thumb was to not exceed 10 TB for a single volume. If there was a high rate of change on the volume, it may need to be even smaller. With Windows Server 2016, however, Microsoft optimized this operation for multiple threads so that you can run many threads on a single volume, and as such, volumes up to 64 TB in size are now fully supported.
Support for Large Files
Windows Server 2012 could technically allow for dedup of files up to 1 TB, but it was not a good idea. The larger the file, the worse the impact to performance became. With Windows Server 2016, however, up to 1 TB are fully supported without degradation.
Simplified Implementation for Virtual Backup Files
One of the primary use cases for deduplication has been for backup files, like Veeam and DPM. However, there was a lot of configuration that had to be done ahead of time to optimize the volumes for that backup workload. If those optimizations weren’t done, everything would work … but would most likely break catastrophically in a few months or years. It was a serious concern, because the only way to fix those issues was to move all data off the volume and reformat it from scratch. With Windows Server 2016 there is now a simple configuration type for the dedup volume called virtualized backups, which if chosen will optimize everything for the workload.
While there are some serious improvements with dedup, it’s very important to plan your deployment appropriately. Certain workloads will not dedup well, some aren’t even supported in a dedup scenario, and certain types of data will actually be the same size with dedup on as off.
- What’s New in Windows Server 2016: Hyper-V
- Windows Server Licensing Changes in 2016: Core-Based Licenses
- What’s New in Windows Server 2016: Failover Clustering
- What’s New in Windows Server 2016: Nano Servers
- What’s New in Windows Server 2016: Containers, Part 1
- What’s New in Windows Server 2016: Containers, Part 2
- What’s New in Windows Server 2016: Storage Spaces Direct
- What’s New in Windows Server 2016: Storage Replica
You can also watch my webinar touching on a bunch of the new features.