HP-UX 11i June 2004 Release Notes

ManualsBrandsHP ManualsSoftwareHP-UX 11i v1.6 Technical Computing (TCOE) LTU

301

302

303

304

305

306

307

308

309

310

Programming

Libraries

Chapter 15

307

tracked. When the number of tracked objects reaches about 20,000, the user CPU

time with the splay tree is about half the user CPU time for the old nftw(). At

100,000 tracked inodes, the user CPU time is about 90% less for the splay tree.

Another performance improvement to nftw() eliminated calls to access() by

checking the mode bits in the stat() buffer. This decreased system CPU time by

approximately 4%.

Two defects were fixed in nftw():

—When the FTW_CHDIR option is set, directories are considered unreadable unless

they have both read and execute permissions. (The old nftw() would try to

chdir() into a directory without execute permissions and then abort the walk

with an error).

—When the FTW_CHDIR option is set, a directory object is reported to the user

function before it is chdir()'ed into.

nftw() improvements vary depending on options provided, with the most significant

improvements seen in UNIX95 standard mode with the FTW_PHYS option not set, or

when a very large number of directories exist in the file tree being traversed.

Impact

The code size of ftw() and nftw() has increased by about 40%, but the heap

requirements are reduced by 50% or more.

Minimally, you should find that ftw() operates about 6% faster and nftw() 4% faster.

On very large file trees where the number of tracked inodes is in the tens of thousands or

more, the performance gain of nftw() could be 30% to 40% or more.

If you relied on the FTW_CHDIR defects which were mentioned above, there may need to

be an application change.

Documentation

The ftw (3C) and nftw (3C) manpages have been updated, particularly with respect to

the two defect fixes and means of achieving best concurrency in threaded applications.

Performance Improvements to libc’s malloc()

A new environment variable, _M_CACHE_OPTS, is available to help tune malloc()

performance in kernel-threaded applications. This environment variable configures a

thread-private cache for malloc’ed blocks. If cache is configured, malloc’ed blocks are

placed into a thread's private cache when free() is called, and may thereafter be

allocated from cache when malloc() is called. Having such a cache potentially improves

speed performance for some kernel-threaded applications, by reducing mutex contention

among threads and by deferring coalescence of blocks.

The thread-private cache is only available for kernel-threaded applications, i.e. those

linked with the pthread library. The installed shared pthread library version must be

PHCO_19666 or later, or the application must be statically linked with an archive

pthread library that is version PHCO_19666 or later, or else cache is not available.

By default cache is not active and must be activated by setting _M_CACHE_OPTS to a

legal value. If _M_CACHE_OPTS is set to any out of range values, it is ignored and cache

remains disabled.