HP-UX 11i December 2001 Release Notes
Programming
Libraries
Chapter 13
217
technology, this application will not have any compatibility issues with an existing
/usr/lib/pa20_64/libc.sl.
To make use of the application fastcall and the libcres.a features, changes will need to
be made to existing make files.
Other Considerations There is little to no impact from these changes. There is a
slight (125KB) increase in amount of disk space required for libcres.a. The changes to
the system libraries are transparent to current applications.
Any performance gains for an application are highly dependent on the application’s use
of libc.sl and what interfaces in this library are used.
The fastcall technology will be delivered with all systems. If there are compatibility
concerns, the applications should not be built with this technology.
More API's in libc may make use of the fastcall technology in future releases.
Appropriate changes to the header files will be delivered to track these changes.
Performance Improvements to libc’s ftw(3C) and nftw(3C)
The libc functions ftw() and nftw() have been rewritten to operate faster, avoid stack
overflow conditions, reduce data space usage, and improve parallelism in multi-threaded
applications.
libc and commands which call ftw() and nftw() are affected.
ftw() ftw() was rewritten to eliminate internal recursion, thus avoiding the possibility
of a stack overflow on deep file trees. A single fixed-size data structure is allocated in the
stack instead of using malloc() to separate buffers for each depth of the tree. Use of
strlen() was eliminated, as well as trivial comparisons such as strcmp(buf,"."). The file
descriptor re-use algorithm was changed from most-recently-opened to
least-recently-opened which can show significant performance gains on very deep file
trees.
ftw() will typically show 8% reductions in elapsed time and 50% or more reduction in
heap space used.
nftw() nftw() was rewritten similarly to ftw() with the same benefits. nftw() now
fully conforms with the UNIX95 definition, including the fact that when the FTW_PHYS is
not set, files are reported only once.
Threaded applications can obtain greater concurrency when specifying absolute path
names for the starting path, and FTW_CHDIR is not set. In addition, an internal
unbalanced binary tree was replaced with a much more efficient splay tree. The effect of
this tree change becomes significant as the number of object inodes being tracked
increases. Directory inodes are always tracked, and when executing in UNIX95 mode
and the FTW_PHYS option is not set, all files and directories are tracked. When the
number of tracked objects reaches about 20,000, the user CPU time with the splay tree is
about half the user CPU time for the old nftw(). At 100,000 tracked inodes, the user
CPU time is about 90% less for the splay tree.
Another performance improvement to nftw() eliminated calls to access() by checking
the mode bits in the stat() buffer. This decreased system CPU time by approximately
4%.