HP-UX 11i Release Notes (December 2000)
Programming
Libraries
Chapter 13 259
More API's in libc may make use of the fastcall technology in future
releases. Appropriate changes to the header files will be delivered to
track these changes.
Performance Improvements to libc’s ftw(3C) and nftw(3C)
The libc functions ftw() and nftw() have been rewritten to operate
faster, avoid stack overflow conditions, reduce data space usage, and
improve parallelism in multi-threaded applications.
libc and commands which call ftw() and nftw() are affected.
ftw() ftw() was rewritten to eliminate internal recursion, thus
avoiding the possibility of a stack overflow on deep file trees. A single
fixed-size data structure is allocated in the stack instead of using
malloc() to separate buffers for each depth of the tree. Use of
strlen() was eliminated, as well as trivial comparisons such as
strcmp(buf,"."). The file descriptor re-use algorithm was changed from
most-recently-opened to least-recently-opened which can show
significant performance gains on very deep file trees.
ftw() will typically show 8% reductions in elapsed time and 50% or
more reduction in heap space used.
nftw() nftw() was rewritten similarly to ftw() with the same
benefits. nftw() now fully conforms with the UNIX95 definition,
including the fact that when the FTW_PHYS is not set, files are reported
only once.
Threaded applications can obtain greater concurrency when specifying
absolute path names for the starting path, and FTW_CHDIR is not set. In
addition, an internal unbalanced binary tree was replaced with a much
more efficient splay tree. The effect of this tree change becomes
significant as the number of object inodes being tracked increases.
Directory inodes are always tracked, and when executing in UNIX95
mode and the FTW_PHYS option is not set, all files and directories are
tracked. When the number of tracked objects reaches about 20,000, the
user CPU time with the splay tree is about half the user CPU time for
the old nftw(). At 100,000 tracked inodes, the user CPU time is about
90% less for the splay tree.
Another performance improvement to nftw() eliminated calls to
access() by checking the mode bits in the stat() buffer. This
decreased system CPU time by approximately 4%.