White Papers

5 DR Series Appliance Cleaner Best Practices

1 Deduplication and cleaning

Data deduplication refers to a technique for eliminating redundant data across all files in a data set. Large

amounts of data can be significantly reduced saving costs and resources. The DR implements a variable

block sliding window deduplication engine to produce industry leading results.

Each file stored in a DR appliance is a blockmap consisting of pointers to its chunks of data saved on the

filesystem. When a file is stored in the DR appliance, a chunking process is applied such that the file is

segmented into variable sized chunks. Each chunk is fingerprinted and examined in the DR’s

deduplication dictionary to see if it has been discovered before. If the DR system has encountered the

chunk already, the files block map is updated to point to the already existing chunk in the system and the

chunks reference count is increased. If a chunk is unique to the DR system, the chunk’s fingerprint is

inserted into the dictionary, the chunk is written to the filesystem, the files blockmap is updated to point to

it and the chunks reference count is set to 1. Thus, files stored in the DR might point to unique blocks,

non-unique blocks that are shared with other files in the system, or a combination of unique and non-

unique blocks.

When a file is deleted, the files chunk reference counts are decremented by 1.

The DR cleaner plays a critical role to a DRs capacity because it updates chunk reference counts and

reclaims space when chunk references are equal zero.

DR appliances are shipped with the cleaner setting to run automatically during idle time. In most cases,

this setting allows more than enough time to update chunk references and reclaim space. In extreme

cases where the DR cleaner is not permitted complete a full pass at least once per week, the suggestions

in the documentation should be applied to the cleaner settings.