HP-UX 11i v3 Crash Dump Improvements
Page 12
gated by the dump unit which takes the longest time to complete the dump of its memory area, the
imbalances inherent in compressed dump can reduce the scalability of the parallelism.
For example, on a system with 3 dump units with equal sizes of memory to be dumped and equal
I/O rates to the disk, the uncompressed dump time for each dump unit would be equal. I.e.,
without compression if one dump unit takes 30 minutes to complete the overall dump time will be
30 minutes plus dump startup and completion time. However, with compression the data sizes to
be dumped across the three dump units may now be unequal due to varying compression ratios
across the different memory areas. This can result in unequal dump times across the dump units,
producing dump times, for example, of 3 minutes, 5 minutes, and 7 minutes, respectively, for the
three dump units. The overall dump has to wait for the 7 minute dump unit to complete, thus
reducing somewhat the scalability of the 3 dump unit parallelism.
In some configurations one may need to consider the tradeoffs between compression and higher
levels of parallelism. Given typical compression ratios (and corresponding reduction in dump
times) of 5 to 10, this will often be greater than the performance improvement that can be
achieved with additional parallelism
5
without compression. Combining compression, for
example, with 2 to 4 dump units would generally be better than uncompressed dump with more
dump units. Not to mention the increased complexity of dealing with the large numbers of dump
devices that would be needed in the latter case. Gray areas would include systems which only
have enough CPUs for one compressed dump unit, e.g. an 8-CPU system. Even in this case
compression would generally be recommended because it would give 5 to 10 times the
performance of non-compression versus a maximum of 8, assuming perfect scaling, with just
parallelism.
3.4.3 Varying device/HBA/link speeds
As mentioned above, the overall dump time in parallel dump is gated by the dump unit of longest
duration. Therefore, varying device or HBA or link speeds across the various dump units can
result in a reduction of the overall scalability of the parallelism. For example, consider a system
with two dump units, each with one-half of system memory to dump, in which one of the dump
units takes twice as long to dump its portion of memory as the other. In this case the dump time
with parallelism would be 2/3 of the dump time without parallelism (rather than the ½ time that
can potentially be achieved if both dump units operate at equal speeds).
Note: Using identical sizes and types of dump devices and HBAs in the dump configuration is one
way to avoid inequalities in dump speeds or times across the dump units. This will tend to produce
more predictable results and better overall parallelism.
3.4.4 Shared swap/dump devices
It is recommended that shared swap and dump devices or volumes not be used with parallel
dump. Using a shared swap/dump device can significantly increase the subsequent reboot time
because such devices result in swap being disabled while saving the corresponding dump data
(eg. in /var/adm/crash). In the case of parallel dump (multiple dump units), if any of the dump
devices were shared with swap then swapping will be disabled across the saving of the whole
dump (not just the saving of the shared dump/swap devices), which can significantly increase the
reboot time.
5
Typical compression ratios of 5 to 10 are based on internal HP testing and the results seen on individual systems may vary.