HP-UX 11i v3 Crash Dump Improvements

Page 12

gated by the dump unit which takes the longest time to complete the dump of its memory area, the

imbalances inherent in compressed dump can reduce the scalability of the parallelism.

For example, on a system with 3 dump units with equal sizes of memory to be dumped and equal

I/O rates to the disk, the uncompressed dump time for each dump unit would be equal. I.e.,

without compression if one dump unit takes 30 minutes to complete the overall dump time will be

30 minutes plus dump startup and completion time. However, with compression the data sizes to

be dumped across the three dump units may now be unequal due to varying compression ratios

across the different memory areas. This can result in unequal dump times across the dump units,

producing dump times, for example, of 3 minutes, 5 minutes, and 7 minutes, respectively, for the

three dump units. The overall dump has to wait for the 7 minute dump unit to complete, thus

reducing somewhat the scalability of the 3 dump unit parallelism.

In some configurations one may need to consider the tradeoffs between compression and higher

levels of parallelism. Given typical compression ratios (and corresponding reduction in dump

times) of 5 to 10, this will often be greater than the performance improvement that can be

achieved with additional parallelism

without compression. Combining compression, for

example, with 2 to 4 dump units would generally be better than uncompressed dump with more

dump units. Not to mention the increased complexity of dealing with the large numbers of dump

devices that would be needed in the latter case. Gray areas would include systems which only

have enough CPUs for one compressed dump unit, e.g. an 8-CPU system. Even in this case

compression would generally be recommended because it would give 5 to 10 times the

performance of non-compression versus a maximum of 8, assuming perfect scaling, with just

parallelism.

3.4.3 Varying device/HBA/link speeds

As mentioned above, the overall dump time in parallel dump is gated by the dump unit of longest

duration. Therefore, varying device or HBA or link speeds across the various dump units can

result in a reduction of the overall scalability of the parallelism. For example, consider a system

with two dump units, each with one-half of system memory to dump, in which one of the dump

units takes twice as long to dump its portion of memory as the other. In this case the dump time

with parallelism would be 2/3 of the dump time without parallelism (rather than the ½ time that

can potentially be achieved if both dump units operate at equal speeds).

Note: Using identical sizes and types of dump devices and HBAs in the dump configuration is one

way to avoid inequalities in dump speeds or times across the dump units. This will tend to produce

more predictable results and better overall parallelism.

3.4.4 Shared swap/dump devices

It is recommended that shared swap and dump devices or volumes not be used with parallel

dump. Using a shared swap/dump device can significantly increase the subsequent reboot time

because such devices result in swap being disabled while saving the corresponding dump data

(eg. in /var/adm/crash). In the case of parallel dump (multiple dump units), if any of the dump

devices were shared with swap then swapping will be disabled across the saving of the whole

dump (not just the saving of the shared dump/swap devices), which can significantly increase the

reboot time.

Typical compression ratios of 5 to 10 are based on internal HP testing and the results seen on individual systems may vary.