HP-UX 11i June 2004 Release Notes
Programming
Libraries
Chapter 15
307
tracked. When the number of tracked objects reaches about 20,000, the user CPU
time with the splay tree is about half the user CPU time for the old nftw(). At
100,000 tracked inodes, the user CPU time is about 90% less for the splay tree.
Another performance improvement to nftw() eliminated calls to access() by
checking the mode bits in the stat() buffer. This decreased system CPU time by
approximately 4%.
Two defects were fixed in nftw():
—When the FTW_CHDIR option is set, directories are considered unreadable unless
they have both read and execute permissions. (The old nftw() would try to
chdir() into a directory without execute permissions and then abort the walk
with an error).
—When the FTW_CHDIR option is set, a directory object is reported to the user
function before it is chdir()'ed into.
nftw() improvements vary depending on options provided, with the most significant
improvements seen in UNIX95 standard mode with the FTW_PHYS option not set, or
when a very large number of directories exist in the file tree being traversed.
Impact
The code size of ftw() and nftw() has increased by about 40%, but the heap
requirements are reduced by 50% or more.
Minimally, you should find that ftw() operates about 6% faster and nftw() 4% faster.
On very large file trees where the number of tracked inodes is in the tens of thousands or
more, the performance gain of nftw() could be 30% to 40% or more.
If you relied on the FTW_CHDIR defects which were mentioned above, there may need to
be an application change.
Documentation
The ftw (3C) and nftw (3C) manpages have been updated, particularly with respect to
the two defect fixes and means of achieving best concurrency in threaded applications.
Performance Improvements to libc’s malloc()
A new environment variable, _M_CACHE_OPTS, is available to help tune malloc()
performance in kernel-threaded applications. This environment variable configures a
thread-private cache for malloc’ed blocks. If cache is configured, malloc’ed blocks are
placed into a thread's private cache when free() is called, and may thereafter be
allocated from cache when malloc() is called. Having such a cache potentially improves
speed performance for some kernel-threaded applications, by reducing mutex contention
among threads and by deferring coalescence of blocks.
The thread-private cache is only available for kernel-threaded applications, i.e. those
linked with the pthread library. The installed shared pthread library version must be
PHCO_19666 or later, or the application must be statically linked with an archive
pthread library that is version PHCO_19666 or later, or else cache is not available.
By default cache is not active and must be activated by setting _M_CACHE_OPTS to a
legal value. If _M_CACHE_OPTS is set to any out of range values, it is ignored and cache
remains disabled.