HP-UX Encrypted Volume and File System Performance and Tuning Introduction ................................................................................................................................... 2 EVFS Architecture ........................................................................................................................... 3 EVFS Processor Affinity.................................................................................................................
Introduction HP-UX Encrypted Volume and File System is an OS-based data encryption product. EVFS transparently encrypts user process data as it is being written to a configured storage device, and decrypts data that is read from the storage device. Data is encrypted using a symmetric volume encryption key. The key is loaded into the kernel at volume configuration time and the EVFS kernel module then uses the key to process outgoing and incoming data for encryption and decryption.
EVFS Architecture EVFS consists of user-space tools, key generation and management utilities, and a pseudo-driver. The heart of EVFS is the pseudo-driver, which encrypts and decrypts data. In Figure 1, the EVFS modules are colored green.
EVFS Processor Affinity EVFS is CPU intensive. The EVFS pseudo-driver is kernel-resident, and all encrypted data must be encrypted by CPU before being written to the storage device or decrypted after being read from the storage device. A critical measure of performance for EVFS is what level of CPU utilization is required to read/write a given measure of data throughput. EVFS is multi-threaded. The number of system threads running EVFS is configurable.
Figure 3 - EVFS threads EVFS Memory Allocation The HP-UX file system (JFS) allocates memory for user process applications using the buffer cache feature. Applications that read and write data to an EVFS volume can use the buffer cache for readahead and write-behind operations. Because the EVFS pseudo-driver exists below the file system layer – and thus buffer cache – EVFS performance can benefit from correctly instrumented buffer cache utilization.
configured at 5-50% by default - utilizing 3.4GB at startup. Observe “Sys Mem” and “Buf Cache” on the bottom lines of the display. Figure 4 - System Memory at Startup Using IOZone (see the Testing Methodology chapter for details) to drive the system to maximum data transfer capacity to a storage device configured for non-EVFS data, system memory (Sys Mem) utilization is increased to 6.6GB maximum, and buffer cache (Buf Cache) is peaked at the configured upper threshold of 50%, or 32GB.
Figure 6 - System Memory After EVFS I/O Copy This test indicates that EVFS will maximize memory utilization for the pseudo-driver process at 500MB - 7.1GB – 6.6GB = 500MB - on this system (see the Testing Methodology chapter for full system configuration details). 500MB is not an excessive amount of memory – proving that EVFS by itself does not require massive memory upgrades.
Testing Methodology Many types of transactions and protocols benefit from standardized specifications for testing bodies and test suites. This is not true for data encryption. Considering the lack of specifications, the clearest method to measure and communicate the performance effects of data encryption is to measure identical tests while reading and writing clear data versus encrypted data, then compare the results.
• • • • • • • • 8-way 1.6GHz CPUs 64GB memory 4-port Fiber Channel connectivity HP-UX 11iv2 VxFS 4.1 file system LVM VA7510 Storage Array AutoRAID HP Integrity servers are optimized for cryptographic operations, and thus will perform better than PARISC servers. All current and future EVFS performance profiling will be tested on HP Integrity servers. Unless otherwise noted, buffer cache is set to the default, which is 5%-50%.
EVFS Performance and Tuning Results EVFS and Clear I/O throughput were tested with various HP-UX 11iv2 kernel tuning parameter sets and VxFS 4.1 tuning parameter sets. The kernel tuning sets did not result in significant performance improvements for either clear I/O or EVFS. All further tuning examples are for VxFS exclusively. Note: Tuning EVFS for Performance Extensive testing has proven that tuning system I/O for clear data throughput performance also improves EVFS throughput performance.
IOZone Results: Default Tuning The first set of IOZone test results are performed on a newly-loaded system with default HP-UX 11iv2 kernel parameters and default VxFS filesystem parameters.
7% 2% 4% 1 14% 12% 10 17% 25 18% 20% 22% 20% 50 100 Throughput KBs Scale20 CPU Utilization % 64k Block Random Write - 100mb File Size IOZone Threads Clear CPU EVFS CPU Clear Random Write EVFS Random Write Figure 8 – Default Random Writes Random writes exhibit a normal distribution where EVFS throughput is less than clear I/O due to the EVFS Pseudo Driver path that I/O must traverse.
The read-data can only be retrieved from disk as fast as the OS, file system, and driver can process the transactions. We will see later that tuning for read-ahead can help read throughput, but without the buffer cache utilization that writes can benefit from, the application throughput measurement will appear to be much less.
IOZone Results: VxFS Tuning HP-UX system tuning for EVFS performance would intuitively call for kernel tuning and file system tuning. However, empirical testing has proven that HP-UX kernel testing has very little influence on EVFS performance, or clear I/O performance (using IOZone). VxFS tuning showed dramatic performance improvement. Therefore, tuning for these tests is limited specifically to VxFS parameters.
was increased at a higher rate than the EVFS performance. However, EVFS throughput is increased and CPU utilization decreased with these simple tunes. Also, note that bulk sequential I/O is not a common application task.
5% 8% 27% 24% 19% 7% 5% 26% 7% Throughput KBs Scale 1 CPU Utilization % 64k Block Sequential Read - 100mb File Size 6% 1 IOZone Threads Clear CPU EVFS CPU Clear read EVFS read Figure 12 – Tuned Sequential Reads Tuning improved EVFS sequential read throughput by about 40%, while reducing CPU utilization. Tuned clear I/O throughput was statistically identical to default, but CPU utilization dropped by as much as 33%.
Summary: VxFS Tuning Most applications are more similar to random read and write operations than sequential I/O. Applying the simple VxFS tunes shown earlier can result in huge performance gains for both EVFS and clear I/O. In many cases, EVFS might reasonably be expected to not affect application performance behavior, as long as the CPU headroom is adequate to absorb the extra cycles that EVFS would require.
EVFS Direct I/O, and Buffer Cache Some applications utilize direct I/O, and therefore cannot benefit from the usage of the system buffer cache. In many cases, reading data does not utilize buffer cache anyway, unless the cache is intentionally populated with read data ahead of time. So the read results that have been reported earlier are representative of how EVFS and clear I/O perform when fetching data from the storage device without a significant benefit of buffer cache.
23% 2% 4% 1 4% 4% 3% 10 24% 22% 15% 25 Throughput KBs Scale1 CPU Utilization % 64k Block Random Write - 100mb File Size 4% 50 100 IOZone Threads Clear CPU EVFS CPU Clear Random Write EVFS Random Write Figure 15: Direct I/O Random Writes with VxFS Tuning 29% 30% 28% 18% 3% 5% 4% 5% 5% 5% 1 IOZone Threads Clear CPU EVFS CPU Clear read EVFS read Figure 16: Direct I/O Sequential Reads with VxFS Tuning Throughput KBs Scale1 CPU Utilization % 64k Block Sequential Read - 100mb Fil
23% 24% 22% 15% 2% 4% 1 3% 10 4% 25 4% 50 Throughput KBs Scale1 CPU Utilization % 64k Block Random Read - 100mb File Size 4% 100 IOZone Threads Clear CPU EVFS CPU Clear Random Read EVFS Random Read Figure 17: Direct I/O Random Reads with VxFS Tuning The direct I/O data as displayed indicates that the throughput of clear I/O and EVFS is effectively consistent.
29% 5% 10% 30% 27% 18% 15% 11% 30% 18% Throughput KBs Scale30 CPU Utilization % 64k Block Sequential Read - 100mb File Size 1 IOZone Threads Clear CPU EVFS CPU Clear read EVFS read Figure 18: Sequential Reads with Default Tuning (Scale30) 11% 2% 1% 1 27% 28% 29% 31% 50 100 14% 14% Throughput KBs Scale30 CPU Utilization % 64k Cached Sequential Reads - 100mb File Size 6% 10 25 IOZone Threads Clear CPU EVFS CPU Clear Reads EVFS Reads Figure 19: Cached Sequential Reads Default Tu
Note: Cached EVFS Data EVFS encrypts data at rest on the storage device – data in system memory (buffer cache) exists in the clear.
The test results in this chapter illustrate the performance differential between direct I/O (for both EVFS and clear I/O) and utilizing system memory for buffer caching. Clearly, utilizing buffer cache is an effective method to improve application performance with EVFS. But system memory is not free or inexhaustible, and therefore observing how buffer cache sizing affects performance is important. In the following graphs, buffer cache has been set to 5%.
25% 19% 13% 2% 4% 6% 4% 1 10 6% 25 24% Throughput KBs Scale20 CPU Utilization % 64k Block Random Write - 100mb File Size 6% 50 100 IOZone Threads Clear CPU EVFS CPU Clear Random Write EVFS Random Write Figure 22: Random Writes VxFS Tuning 5% dbc_max For random writes the buffer cache exhaustion effect occurs after 25 threads, where throughput performance drops off significantly. Read performance with buffer cache set to 5% is dramatically different than the write results above.
% 19% 13% 2% 4% 6% 4% 1 10 24% 6% 25 Throughput KBs Scale1 CPU Utilization % 64k Block Random Read - 100mb File Size 6% 50 100 IOZone Threads Clear CPU EVFS CPU Clear Random Read EVFS Random Read Figure 24: Random Reads VxFS Tuning 5% dbc_max For read-oriented applications, significantly reduced memory utilization can achieve the same results as with a larger memory configuration.
The graph scale above is increased from Scael1 to Scale30 to illustrate that the throughput increases by about 70 times over reading the data from disk. Clear I/O and EVFS throughput are effectively identical, and CPU utilization is too. This indicates that read-oriented applications can utilize EVFS with lower memory utilization and still achieve identical performance using buffer cache.
EVFS Encryption Key Length EVFS can be configured for different symmetric encryption keys lengths. The default key length is AES 128 bit. Some institutions require stronger encryption, so EVFS also provides AES 256 bit keys. The following test illustrates how EVFS performs when using 128 bit versus 256 bit encryption keys.
Sequential reads show the same results – EVFS performs essentially the same for throughput and CPU utilization for both keys lengths. For these test cases, EVFS encryption security can be configured to comply with stronger governmental or industry standards for encryption without sacrificing performance. It is strongly recommended to test this configuration in your own environment before configuring EVFS with the longer key length.
EVFS Testing With Postmark All testing so far has been done with the IOZone benchmark tool, which is very effective for profiling input/output system characteristics. However, IOZone does not represent most user applications. The Postmark benchmark was created to more accurately represent actual user applications. Postmark can create very large file sets, and then operate on existing files after their creation. The application profile was intended to resemble a mail server.
The conclusions drawn from this these tests are that for a benchmark that more closely approximates a real application, EVFS can perform near to clear I/O for throughput performance, and that by using JFS 5.0 and simple VxFS tunes, there is significant improvement in performance available at no cost.
Summary EVFS is a software module that provides the ability to encrypt data at-rest on an existing storage device for HP-UX servers running 11iv2 or 11iv3. EVFS operates in the system kernel, and thus requires CPU cycles for encryption operations. This additional CPU over clear I/O will also throttle throughput. Systems that have existing CPU headroom will likely not be affected by the additional requirements of EVFS.