Operation Manual

Deduplication at target

After a backup to a deduplicating vault is completed, the storage node runs the indexing activity. This

activity deduplicates the data in the vault as follows:

1. It moves the data blocks from the temporary file to a special file within the vault, storing

duplicate items there only once. This file is called the deduplication data store.

2. It saves the hash values and the links that are necessary to "assemble" the deduplicated data to

the deduplication database.

3. After all the data blocks have been moved, it deletes the temporary file.

As a result, the data store contains a number of unique data blocks. Each block has one or more

references from the backups. The references are contained in the deduplication database. The

backups remained untouched. They contain hash values and the data that cannot be deduplicated.

The following diagram illustrates the result of deduplication at target.

The indexing activity may take considerable time to complete. You can view this activity's state on

the management server, by selecting the corresponding storage node and clicking View details (p.

224). You can also manually start or stop this activity in that window.

If you back up a large amount of unique data, the indexing activity may fail due to insufficient RAM on the

storage node. The backups will continue to run. You can add more RAM to the storage node, or delete

unnecessary backups and run compacting. After the next backup, the indexing will run again.