White Papers
5 DR Series Appliance Cleaner Best Practices
1 Deduplication and cleaning
Data deduplication refers to a technique for eliminating redundant data across all files in a data set. Large
amounts of data can be significantly reduced saving costs and resources. The DR implements a variable
block sliding window deduplication engine to produce industry leading results.
Each file stored in a DR appliance is a blockmap consisting of pointers to its chunks of data saved on the
filesystem. When a file is stored in the DR appliance, a chunking process is applied such that the file is
segmented into variable sized chunks. Each chunk is fingerprinted and examined in the DR’s
deduplication dictionary to see if it has been discovered before. If the DR system has encountered the
chunk already, the files block map is updated to point to the already existing chunk in the system and the
chunks reference count is increased. If a chunk is unique to the DR system, the chunk’s fingerprint is
inserted into the dictionary, the chunk is written to the filesystem, the files blockmap is updated to point to
it and the chunks reference count is set to 1. Thus, files stored in the DR might point to unique blocks,
non-unique blocks that are shared with other files in the system, or a combination of unique and non-
unique blocks.
When a file is deleted, the files chunk reference counts are decremented by 1.
The DR cleaner plays a critical role to a DRs capacity because it updates chunk reference counts and
reclaims space when chunk references are equal zero.
DR appliances are shipped with the cleaner setting to run automatically during idle time. In most cases,
this setting allows more than enough time to update chunk references and reclaim space. In extreme
cases where the DR cleaner is not permitted complete a full pass at least once per week, the suggestions
in the documentation should be applied to the cleaner settings.