- NETAPP TECHNICAL REPORT NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide Carlos Alvarez March 2009 | TR-3505-0309 ABSTRACT ® This guide introduces NetApp deduplication for FAS technology, describes in detail how to implement and use it, and provides information on best practices, operational considerations, and troubleshooting. 
- TABLE OF CONTENTS 1 2 3 4 2 INTRODUCTION AND OVERVIEW OF DEDUPLICATION ....................................................... 4 1.1 HOW DEDUPLICATION FOR FAS WORKS ............................................................................................... 4 1.2 DEDUPLICATED VOLUMES ....................................................................................................................... 5 1.3 DEDUPLICATION METADATA ................................................................. 
- 5 DEDUPLICATION AND VMWARE ........................................................................................... 33 5.1 VMFS DATA STORE ON FIBRE CHANNEL OR ISCSI: SINGLE LUN ..................................................... 33 5.2 VMWARE VIRTUAL DISKS OVER NFS/CIFS ........................................................................................... 34 5.3 DEDUPLICATION ARCHIVE OF VMWARE .............................................................................................. 
- 1 INTRODUCTION AND OVERVIEW OF DEDUPLICATION This section provides an overview of how deduplication works for FAS and V-Series systems. Notes: 1. Whenever references are made to deduplication for FAS in this document, the reader should assume that the same information also applies to V-Series systems, unless otherwise noted. 2. NetApp deduplication for VTL is not covered within this technical report and is beyond the scope of this document. 1. 
- In summary, this is how deduplication works. Newly saved data on the FAS system is stored in 4KB blocks as usual by Data ONTAP. Each block of data has a digital fingerprint, which is compared to all other fingerprints in the flexible volume. 
- Releasing a duplicate data block entails updating the indirect inode pointing to it, incrementing the block reference count for the already existing data block, and freeing the duplicate data block. In real time, as additional data is written to the deduplicated volume, a fingerprint is created for each new block and written to a change log file. 
- 1.4 GENERAL DEDUPLICATION FEATURES Deduplication is enabled on a per flexible volume basis. It can be enabled on any number of flexible volumes in a storage system. It can be run one of four different ways: Scheduled on specific days and at specific times Manually using the command line Automatically, when 20% new data has been written to the volume Automatically on the destination volume, when used with SnapVault® Only one deduplication process can run on a flexible volume at a time. 
- Here are some additional considerations with regard to max volume sizes for deduplication: Once an upgrade to Data ONTAP 7.3.1 is complete, the new maximum volume sizes for Data ONTAP 7.3.1 will be in effect. If considering a downgrade or revert, it is highly recommended that NetApp Global Services be consulted for best practice. During a revert from Data ONTAP 7.3.1 to an earlier version of Data ONTAP with smaller volume limits, volumes should be within the limits of the lower version of Data ONTAP. 
- 2.3 COMMAND SUMMARY Table 2 describes all deduplication (related) commands. Table 2) Deduplication command summary. Command Summary sis on  Enables deduplication on the specified flexible volume. sis start -s  Begins the deduplication process on the flexible volume specified and performs a scan of the flexible volume to process existing data. This option is typically used upon initial configuration and deduplication on an existing flexible volume that contains undeduplicated data. 
- Command Summary sis check  (This command is available only in Diag mode.) Verifies and updates the fingerprint database for the specified flexible volume; includes purging stale fingerprints. sis stat  (This command is available only in Diag mode.) Displays the statistics of flexible volumes that have deduplication enabled. sis undo  (This command is available in Advanced and Diag modes.) Reverts a deduplicated volume to a normal flexible volume. 2. 
- 1. Create a flexible volume (keeping in mind the maximum allowable volume size for the platform, as specified in the requirements table at the beginning of this section). r200-rtp01*> vol create VolArchive aggr0 200g Creation of volume 'VolArchive' with size 200g on containing aggregate 'aggr0' has completed. 2. Enable deduplication on the flexible volume and verify that it’s turned on. The vol status command shows the attributes for flexible volumes that have deduplication turned on. 
- 7. Run deduplication on the flexible volume. These causes the change log to be processed, fingerprints to be sorted and merged, and duplicate blocks to be found. r200-rtp01> sis start /vol/VolArchive The deduplication operation for "/vol/VolArchive" is started. 8. 9. Use sis status to monitor the progress of deduplication. 
- On flexible volume dvol_2, deduplication is scheduled to run every day from Sunday to Friday at 11 p.m. On flexible volume dvol_3, deduplication is set to autoschedule. This means that deduplication is triggered by the amount of new data written to the flexible volume, specifically when there are 20% new fingerprints in the change log. On flexible volume dvol_4, deduplication is scheduled to run at 6 a.m. on Saturday. 
- The more concurrent deduplication processes you’re running, the more system resources are consumed. Given the previous two items, the best option is to do one of the following: Use the auto mode so that deduplication runs only when significant additional data has been written to each particular flexible volume (this tends to naturally spread out when deduplication runs). Stagger the deduplication schedule for the flexible volumes so that it runs on alternative days. Run deduplication manually. 
- (There are no configurable parameters that can tune the deduplication process; that is, the priority of this background process in Data ONTAP is fixed.) IMPACT ON THE SYSTEM DURING THE DEDUPLICATION PROCESS The deduplication operation runs as a low-priority background process on the system. However, it can still affect the performance of user I/O and other applications running on the system. 
- The PAM card has provided significant performance improvements in VMware ® VDI environments. The advantages provided by the NetApp PAM are further enhanced when combined with other shared block technologies such as NetApp deduplication or FlexClone ®. For additional information regarding the PAM card, refer to TR-3705, NetApp and VMware VDI Best Practices. 3.3 DEDUPLICATION STORAGE SAVINGS This section discusses storage savings that deduplication can be expected to deliver. 
- The storage savings may continue to stay low. When the last Snapshot copy that was created before deduplication was run is deleted, the storage savings should increase noticeably. The question thus becomes when to run deduplication again in order to achieve maximum capacity savings. The answer is that deduplication should be run, and allowed to complete, before the creation of each and every Snapshot copy; this provides the most storage savings benefit. 
- OVERVIEW OF SSET The SSET is available to NetApp system engineers, including NetApp partners, and performs nonintrusive testing of the data set to determine the effectiveness of deduplication. This tool is intended for use only by NetApp personnel to analyze data at current or prospective NetApp users. By installing this software, the user agrees to keep this tool and any results from this tool confidential between them and NetApp. 
- Table 5) Maximum deduplicated volume sizes. Data ONTAP 7.2.X (Starting with 7.2.5.1) and Data ONTAP 7.3.0 FAS2020 FAS3020 FAS3050 FAS3040 N5200 N5500 FAS2050 R200 FAS3070 FAS6030 FAS6070 FAS3140 N5600 FAS6040 FAS6080 N5300 FAS3160 N7600 N7800 FAS3170 0.5TB 1TB 2TB 3TB 4TB 6TB 10TB 16TB FAS3020 FAS3050 FAS3040 R200 FAS3070 FAS6030 FAS6070 N5200 N5500 FAS3140 N5600 FAS6040 FAS6080 N5300 FAS3160 N7600 N7800 Data ONTAP 7.3. 
- NUMBER OF DEDUPLICATION PROCESSES A maximum of eight deduplication processes can be run at the same time on one FAS system. If another flexible volume is scheduled to have deduplication run while eight deduplication processes are already running, deduplication for this additional flexible volume is queued. For example, suppose that a user sets a default schedule (sun-sat@0) for 10 deduplicated volumes. Eight will run at midnight, and the remaining two will be queued. 
- 4.2 DEDUPLICATION AND SNAPRESTORE ® The SnapRestore functionality is supported with deduplication, and it works in the same way with deduplication as it does without deduplication. If you’re running Data ONTAP 7.3, note the following. Starting with Data ONTAP 7.3, the deduplication metadata files (the fingerprint database and the change log files) do not get restored when SnapRestore is executed, because they are located outside the volume in the aggregate. 
- The cloned volume inherits the deduplication configuration of the parent volume, such as the deduplication schedule. Starting with Data ONTAP 7.3, the deduplication metadata files (the fingerprint database and the change log files) do not get cloned, because they are located outside the volume in the aggregate. In this case, there is no fingerprint database file in the cloned volume for the data that came from the parent. 
- The checksum type you can use with deduplication on a V-Series system is restricted. Only block checksum type (BCS) is supported with deduplication on a V-Series system. Zoned checksums are not supported, and a performance degradation will exist on random workloads with zoned checksums (ZCS). Refer to the Data ONTAP Data Protection Online Backup and Recovery Guide for information about NearStore configuration. 
- Deduplication can be enabled, run, and managed only from the primary location. However, the flexible volume at the secondary location inherits all the deduplication attributes and storage savings using SnapMirror. Shared blocks are transferred only once, so deduplication reduces network bandwidth usage too. The volume SnapMirror update schedule is not tied to the deduplication schedule. 
- As a best practice, NetApp recommends performing qtree SnapMirror updates after the deduplication process on the source volume has finished running. If a qtree SnapMirror update occurs while the deduplication process is running on the source volume, then in addition to the transfer of the changed data blocks, some unchanged data blocks might also get transferred to the destination. 
- Qtree SnapMirror Replication with Deduplication Enabled on the Destination Only A nondeduplicated flexible volume on the source can be replicated to a deduplicated volume on the destination by using qtree SnapMirror, as shown in Figure 5. Figure 5) Qtree SnapMirror replication from a nondeduplicated source volume to a deduplicated destination volume. Keep the following points in mind: Deduplication is licensed only on the destination system. 
- Keep the following points in mind: Deduplication is licensed on both the source and the destination. Deduplication is enabled, run, and managed independently on the source and the destination. Deduplication doesn’t yield any network bandwidth savings because qtree SnapMirror works at the logical layer, and it sends undeduplicated data over the network. 
- 4.11 DEDUPLICATION AND MULTISTORE (VFILER) Starting with Data ONTAP 7.3, deduplication is supported with MultiStore. In Data ONTAP 7.3, the deduplication commands are available only in the CLI of vFiler0; however, they allow any volume to be included in the command arguments, regardless of which vFiler unit the volume is associated with. Beginning with Data ONTAP 7.3.1, the deduplication commands are available in the CLI of each vFiler unit, allowing each vFiler unit to be configured from within itself. 4. 
- impact will be experienced on low-end systems (for example, 30xx) more than high-end systems (for example, 6xxx). In takeover mode, writes to partner flexible volumes will be change logged. The deduplication process will not run on the partner flexible volumes while in takeover mode. Upon giveback, data in the change logs will be processed, and data will get deduplicated. In takeover mode, change logging will continue until the change log is full. 
- Table 7) LUN configuration examples (as described below). 
- Note: If Snapshot copies are turned off for the volume (or no copy exists in the volume) this is not a recommended configuration or volume for deduplication. Configuration B: LUN Configuration for Shared Volume Space Savings If the user wants to apply the freed blocks to both the fractional overwrite reserve area and the volume free pool, this can be accomplished with the following configuration: 1. LUN space reservation value = on 2. Volume fractional reserve value = any value from 1 – 99 3. 
- Configuration D: LUN Configuration for Maximum Volume Space Savings If the user wants to apply the freed blocks to the volume free pool, this can be accomplished with the following configuration: 1. LUN space reservation value = off 2. Volume rractional reserve value = any value from 0–100 3. Volume guarantee = volume 4. Snap reserve = 0% 5. Autodelete = off 6. Autosize = off 7. 
- 5 DEDUPLICATION AND VMWARE VMware environments deduplicate extremely well. However, while working out the VMDK and data store layouts, keep the following points in mind: Operating system VMDKs deduplicate extremely well because the binary files, patches, and drivers are highly redundant between virtual machines (VMs). Maximum savings can be achieved by keeping these in the same volume. Application binary VMDKs deduplicate to varying degrees. 
- Figure 7) VMFS data store on Fibre Channel or iSCSI—single LUN. 5.2 VMWARE VIRTUAL DISKS OVER NFS/CIFS This is a new configuration that became available starting with VMware 3.0. It has a low installed base currently, but it is hot and growing. It is the easiest to configure and allows deduplication to provide the most space savings. Figure 8) VMware virtual disks over NFS/CIFS. 
- 5.3 DEDUPLICATION ARCHIVE OF VMWARE Deduplication has proven very useful in VMware archive environments. Figure 9 shows an example. Figure 9) Archive of VMware with deduplication. Detailed specifications for the example shown in Figure 9: In this environment, VMware is done using NFS. This environment uses approximately 1,800 clone copies of their master VMware image. These images are used to create virtual machines for primary applications and for test and development purposes. 
- 7 DEDUPLICATION AND EXCHANGE If Exchange and NetApp deduplication for FAS will be used together, the following should be taken into consideration: In some Exchange environments, extents are enabled to improve performance of database validation. Enabling extents does not rearrange blocks on disk that are shared between files by deduplication on deduplicated volumes. 
- 11 TROUBLESHOOTING This section covers issues that occasionally come up when configuring and running deduplication. 11.1 LICENSING Make sure that deduplication is properly licensed and, if the platform is not an R200, make sure that the NearStore option is also properly licensed: fas3070-rtp01> license … a_sis  nearstore_option  … If licensing is removed or expired, no additional deduplication can occur, and no sis commands can run. 
- and are locking a lot of data. This tends to happen especially when deduplication is run on existing flexible volumes of data. Use the snap list command to see what Snapshot copies exist and the snap delete command to remove them. Alternatively, wait for the Snapshot copies to expire and the space savings to appear (see section 4.1, ―Deduplication and Snapshot Copies‖). 11. 
- 11.6 ADDITIONAL REPORTING WITH SIS STAT -1 For additional status information, you can use priv set diag and then use the sis stat –l command for long, detailed listings. The following are some additional details around the sis stat command: When volume-name is omitted, it executes for all known SIS volumes. –l lists all the details about the volume. -b shows the disk space usage and saved disk space in number of blocks. 
- 13 ADDITIONAL ASSISTANCE For additional support contact one of the following: Your local account team Systems engineer Account manager NetApp Global Services http://now.netapp.com 888.4.NETAPP (United States and Canada) 00.800.44.NETAPP (EMEA/Europe) +800.800.80.800 (Asia/Pacific) www.netapp.com 40 © 2009 NetApp. All rights reserved. Specifications are subject to change without notice.