HP-UX 11i March 2002 Release Notes

Programming
Libraries
Chapter 13
217
nftw()
nftw() was rewritten similarly to ftw() with the same benefits. nftw() now fully
conforms with the UNIX95 definition, including the fact that when the FTW_PHYS is
not set, files are reported only once.
Threaded applications can obtain greater concurrency when specifying absolute path
names for the starting path, and FTW_CHDIR is not set. In addition, an internal
unbalanced binary tree was replaced with a much more efficient splay tree. The effect
of this tree change becomes significant as the number of object inodes being tracked
increases. Directory inodes are always tracked, and when executing in UNIX95 mode
and the FTW_PHYS option is not set, all files and directories are tracked. When the
number of tracked objects reaches about 20,000, the user CPU time with the splay
tree is about half the user CPU time for the old nftw(). At 100,000 tracked inodes,
the user CPU time is about 90% less for the splay tree.
Another performance improvement to nftw() eliminated calls to access() by
checking the mode bits in the stat() buffer. This decreased system CPU time by
approximately 4%.
Two defects were fixed in nftw():
When the FTW_CHDIR option is set, directories are considered unreadable unless
they have both read and execute permissions. (The old nftw() would try to
chdir() into a directory without execute permissions and then abort the walk
with an error).
When the FTW_CHDIR option is set, a directory object is reported to the user
function before it is chdir()'ed into.
nftw() improvements vary depending on options provided, with the most significant
improvements seen in UNIX95 standard mode with the FTW_PHYS option not set, or
when a very large number of directories exist in the file tree being traversed.
Impact
The code size of ftw() and nftw() has increased by about 40%, but the heap
requirements are reduced by 50% or more.
Minimally, you should find that ftw() operates about 6% faster and nftw() 4% faster.
On very large file trees where the number of tracked inodes is in the tens of thousands or
more, the performance gain of nftw() could be 30% to 40% or more.
If you relied on the FTW_CHDIR defects which were mentioned above, there may need to
be an application change.
Documentation
The ftw (3C) and nftw (3C) manpages have been updated, particularly with respect to
the two defect fixes and means of achieving best concurrency in threaded applications.
Performance Improvements to libc’s malloc()
A new environment variable, _M_CACHE_OPTS, is available to help tune malloc()
performance in kernel-threaded applications. This environment variable configures a
thread-private cache for malloc’ed blocks. If cache is configured, malloc’ed blocks are
placed into a thread's private cache when free() is called, and may thereafter be
allocated from cache when malloc() is called. Having such a cache potentially improves