File System Tuning Guide File System Tuning Guide StorNext® 3.1.4.
Document Title, 6-01376-15 Rev A, May 2010, Product of USA. Quantum Corporation provides this publication “as is” without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability or fitness for a particular purpose. Quantum Corporation may revise this publication from time to time without notice. COPYRIGHT STATEMENT © 2010 Quantum Corporation. All rights reserved. Your right to copy this manual is limited by copyright law.
Contents The Underlying Storage System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 File Size Mix and Application I/O Characteristics . . . . . . . . . . . . . . . . . . 5 SNFS and Virus Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 The Metadata Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 The Metadata Controller System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents iv StorNext File System Tuning Guide
StorNext File System Tuning The StorNext File System (SNFS) provides extremely high performance for widely varying scenarios. Many factors determine the level of performance you will realize. In particular, the performance characteristics of the underlying storage system are the most critical factors. However, other components such as the Metadata Network and MDC systems also have a significant effect on performance.
StorNext File System Tuning The Underlying Storage System RAID Cache Configuration The single most important RAID tuning component is the cache configuration. This is particularly true for small I/O operations. Contemporary RAID systems such as the EMC CX series and the various Engenio systems provide excellent small I/O performance with properly tuned caching. So, for the best general purpose performance characteristics, it is crucial to utilize the RAID system caching as fully as possible.
StorNext File System Tuning The Underlying Storage System operations involve a very high rate of small writes to the metadata disk, so disk latency is the critical performance factor. Write-back caching can be an effective approach to minimizing I/O latency and optimizing metadata operations throughput. This is easily observed in the hourly File System Manager (FSM) statistics reports in the cvlog file.
StorNext File System Tuning The Underlying Storage System While read-ahead caching improves sequential read performance, it does not help highly transactional performance. Furthermore, some SNFS customers actually observe maximum large sequential read throughput by disabling caching. While disabling read-ahead is beneficial in these unusual cases, it severely degrades typical scenarios. Therefore, it is unsuitable for most environments.
StorNext File System Tuning File Size Mix and Application I/O Characteristics It can be useful to use a tool such as lmdd to help determine the storage system performance characteristics and choose optimal settings. For example, varying the stripe size and running lmdd with a range of I/O sizes might be useful to determine an optimal stripe size multiple to configure the SNFS StripeBreadth. Some storage vendors now provide RAID6 capability for improved reliability over RAID5.
StorNext File System Tuning File Size Mix and Application I/O Characteristics and auto_dma_write_length, described in the Mount Command Options on page 19. Buffer Cache Reads and writes that aren't well-formed utilize the SNFS buffer cache. This also includes NFS or CIFS-based traffic because the NFS and CIFS daemons defeat well-formed I/Os issued by the application. There are several configuration parameters that affect buffer cache performance.
StorNext File System Tuning SNFS and Virus Checking If performance requirements cannot be achieved with NFS or CIFS, consider using a StorNext Distributed LAN client or fibre-channel attached client. It can be useful to use a tool such as netperf to help verify network performance characteristics. SNFS and Virus Checking Virus-checking software can severely degrade the performance of any file system, including SNFS.
StorNext File System Tuning The Metadata Controller System ethtool tool can be very useful to investigate and adjust speed/duplex settings. It can be useful to use a tool like netperf to help verify the Metadata Network performance characteristics. For example, if netperf -t TCP_RR reports less than 15,000 transactions per second capacity, a performance penalty may be incurred. You can also use the netstat tool to identify tcp retransmissions impacting performance.
StorNext File System Tuning The Metadata Controller System used to realize performance gains from increased memory: BufferCacheSize, InodeCacheSize, and ThreadPoolSize. However, it is critical that the MDC system have enough physical memory available to ensure that the FSM process doesn’t get swapped out. Otherwise, severe performance degradation and system instability can result. The operating system on the metadata controller must always be run in U.S. English.
StorNext File System Tuning The Metadata Controller System StripeBreadth 256K MultiPathMethod Rotate Node CvfsDisk6 0 Node CvfsDisk7 1 [StripeGroup MetaFiles] Status UP MetaData Yes Journal No Exclusive Yes Read Enabled Write Enabled StripeBreadth 256K MultiPathMethod Rotate Node CvfsDisk0 0 [StripeGroup JournFiles] Status UP Journal Yes MetaData No Exclusive Yes Read Enabled Write Enabled StripeBreadth 256K MultiPathMethod Rotate Node CvfsDisk1 0 Affinities Affinities are another stripe group feature tha
StorNext File System Tuning The Metadata Controller System Affinity AudFiles ##for Audio Files Only## Read Enabled Write Enabled StripeBreadth 1M MultiPathMethod Rotate Node CvfsDisk4 0 Node CvfsDisk5 1 Note: Affinity names cannot be longer than eight characters. StripeBreadth This setting must match the RAID stripe size or be a multiple of the RAID stripe size. Matching the RAID stripe size is usually the most optimal setting.
StorNext File System Tuning The Metadata Controller System BufferCacheSize This setting consumes up to 2X bytes of memory times the number specified. Increasing this value can reduce latency of any metadata operation by performing a hot cache access to directory blocks, inode information, and other metadata info. This is about 10 to 1000 times faster than I/O. It is especially important to increase this setting if metadata I/O latency is high, (for example, more than 2ms average latency).
StorNext File System Tuning The Metadata Controller System Example: ThreadPoolSize 32 # default 16, 512 KB memory per thread ForcestripeAlignment This setting should always be set to Yes. This is critical if the largest StripeBreadth defined is greater than 1MB. Note that this setting is not adjustable after initial file system creation. Example: ForcestripeAlignment Yes FsBlockSize The FsBlockSize (FSB), metadata disk size, and JournalSize settings all work together.
StorNext File System Tuning The Metadata Controller System Average No.
StorNext File System Tuning The Metadata Controller System This setting is adjustable using the cvupdatefs utility. For more information, see the cvupdatefs man page. Example: JournalSize SNFS Tools 16M The snfsdefrag tool is very useful to identify and correct file extent fragmentation. Reducing extent fragmentation can be very beneficial for performance. You can use this utility to determine whether files are fragmented, and if so, fix them.
StorNext File System Tuning The Metadata Controller System • Large value for FSM threads SUMMARY max busy indicates the FSM configuration setting ThreadPoolSize is insufficient. • Extremely high values for FSM cache SUMMARY inode lookups, TKN SUMMARY TokenRequestV3, or TKN SUMMARY TokenReqAlloc might indicate excessive file fragmentation. If so, the snfsdefrag utility can be used to fix the fragmented files.
StorNext File System Tuning The Metadata Controller System The cvdbset utility has a special “Perf” trace flag that is very useful to analyze I/O performance. For example: cvdbset perf Then, you can use cvdb -g to collect trace information such as this: PERF: Device Write 41 MB/s IOs 2 exts 1 offs 0x0 len 0x400000 mics 95589 ino 0x5 PERF: VFS Write EofDmaAlgn 41 MB/s offs 0x0 len 0x400000 mics 95618 ino 0x5 The “PERF: Device” trace shows throughput measured for the device I/O.
StorNext File System Tuning The Metadata Controller System • Identify disk performance issues. If Device throughput is inconsistent or less than expected, it might indicate a slow disk in a stripe group, or that RAID tuning is necessary. • Identify file fragmentation. If the extent count “exts” is high, it might indicate a fragmentation problem.This causes the device I/Os to be broken into smaller chunks, which can significantly impact throughput. • Identify read/modify/write condition.
StorNext File System Tuning The Metadata Controller System slower CPU.) Differences in latency over time for the same system can indicate new hardware problems, such as a network interface going bad. If a latency test has been run for a particular client, the cvadmin who long command includes the test results in its output, along with information about when the test was last run. Mount Command Options The following SNFS mount command settings are explained in greater detail in the mount_cvfs man page.
StorNext File System Tuning The Distributed LAN (Disk Proxy) Networks The dircachesize option sets the size of the directory information cache on the client. This cache can dramatically improve the speed of readdir operations by reducing metadata network message traffic between the SNFS client and FSM. Increasing this value improves performance in scenarios where very large directories are not observing the benefit of the client directory cache.
StorNext File System Tuning The Distributed LAN (Disk Proxy) Networks It is best practice to have all SNFS Distributed LAN clients and servers directly attached to the same network switch. A router between a Distributed LAN client and server could be easily overwhelmed by the data rates required. It is critical to ensure that speed/duplex settings are correct, as this will severely impact performance. Most of the time auto-detect is the correct setting.
StorNext File System Tuning The Distributed LAN (Disk Proxy) Networks Network Configuration and Topology For maximum throughput, SNFS distributed LAN can utilize multiple NICs on both clients and servers. In order to take advantage of this feature, each of the NICs on a given host must be on a different IP subnetwork. (This is a requirement of TCP/IP routing, not of SNFS - TCP/ IP can't utilize multiple NICs on the same subnetwork.) An example of this is shown in the following illustration.
StorNext File System Tuning Distributed LAN Servers maximum of 1 GByte/s of throughput. SNFS automatically loadbalances among NICs and servers to maximize throughput for all clients. Note: The diagram shows separate physical switches used for the two subnetworks. They can, in fact, be the same switch, provided it has sufficient internal bandwidth to handle the aggregate traffic. Distributed LAN Servers Distributed LAN Servers must have sufficient memory.
StorNext File System Tuning Distributed LAN Client Vs. Legacy Network Attached Storage • Performance • Fault Tolerance • Load Balancing • Client Scalability • Robustness and Stability • Security Model Consistency Performance DLC outperforms NFS and CIFS for single-stream I/O and provides higher aggregate bandwidth. For inferior NFS client implementations, the difference can be more than a factor of two.
StorNext File System Tuning Windows Memory Requirements Largest Tested Configuration Number of Clients Tested (via simulation) NFS CIFS DLC 4 4 1000 Robustness and Stability The code path for DLC is simpler, involves fewer file system stacks, and is not integrated with kernel components that constantly change with every operating system release (for example, the Linux NFS code). Therefore, DLC provides increased stability that is comparable to the StorNext SAN Client.
StorNext File System Tuning Windows Memory Requirements being sent to the application log and cvlog.txt about socket failures with the status code (10555) which is ENOBUFS. The solution is to adjust a few parameters on the Cache Parameters tab in the SNFS control panel (cvntclnt). These parameters control how much memory is consumed by the directory cache, the buffer cache, and the local file cache. As always, an understanding of the customers’ workload aids in determining the correct values.
StorNext File System Tuning Sample FSM Configuration File Sample FSM Configuration File This sample configuration file is located in the SNFS install directory under the examples subdirectory named example.cfg. # ************************************************************ # A global section for defining file system-wide parameters.
StorNext File System Tuning Sample FSM Configuration File # StripeAlignSize 2M # auto alignment, default MAX(StripeBreadth) # OpHangLimitSecs 300 # default 180 secs # DataMigrationThreadPoolSize 128 # Managed only, default 8 # ************************************************************ # A disktype section for defining disk hardware parameters.
StorNext File System Tuning Sample FSM Configuration File [Disk CvfsDisk3] Status UP Type VideoDrive [Disk CvfsDisk4] Status UP Type VideoDrive [Disk CvfsDisk5] Status UP Type VideoDrive [Disk CvfsDisk6] Status UP Type VideoDrive [Disk CvfsDisk7] Status UP Type VideoDrive [Disk CvfsDisk8] Status UP Type VideoDrive [Disk CvfsDisk9] Status UP Type VideoDrive [Disk CvfsDisk10] Status UP Type AudioDrive [Disk CvfsDisk11] Status UP Type AudioDrive [Disk CvfsDisk12] Status UP Type AudioDrive [Disk CvfsDisk13] St
StorNext File System Tuning Sample FSM Configuration File [Disk CvfsDisk15] Status UP Type DataDrive [Disk CvfsDisk16] Status UP Type DataDrive [Disk CvfsDisk17] Status UP Type DataDrive # ************************************************************ # A stripe section for defining stripe groups.
StorNext File System Tuning Sample FSM Configuration File Node Node Node Node Node Node Node CvfsDisk3 CvfsDisk4 CvfsDisk5 CvfsDisk6 CvfsDisk7 CvfsDisk8 CvfsDisk9 1 2 3 4 5 6 7 [StripeGroup AudioFiles] Status UP Exclusive Yes##Exclusive StripeGroup for Audio File Only## Affinity AudFiles Read Enabled Write Enabled StripeBreadth 1M MultiPathMethod Rotate Node CvfsDisk10 0 Node CvfsDisk11 1 Node CvfsDisk12 2 Node CvfsDisk13 3 [StripeGroup RegularFiles] Status UP Exclusive No##Non-Exclusive StripeGroup for
StorNext File System Tuning Sample FSM Configuration File 32 StorNext File System Tuning Guide