HP-UX 11i September 2001 Release Notes
Programming
Libraries
Chapter 13
205
<bucket_size>
is (roughly) the numberof cached ordinary blocks per bucket that will be
held in the ordinary block cache. The allowable values range from 0 through 8*4096 =
32768. If
<bucket_size>
is set to 0, cache is disabled.
<buckets>
is the number of power of 2 buckets that will be maintained per thread. The
allowable values range from 8 though 32. This value controls the size of the largest
ordinary block that can be cached. For example, if
<buckets>
is 8, the largest ordinary
block that can be cached will be 2^8 or 256 bytes. If
<buckets>
is 16, the largest
ordinary block that can be cached will be 2^20 or 65536 bytes, etc.
<bucket_size>
*
<buckets>
is (exactly) the maximum number of ordinary blocks that
will be cached per thread. There is no maximum number of small blocks that will be
cached per thread if the small block cache is active.
<retirement_age>
controls what happens to unused caches. It may happen that an
application has more threads initially than it does later on. In that case, there will be
unused caches, because caches are not automatically freed on thread exit -- by default
they kept and assigned to newly-created threads. But for some applications, this could
result in some caches being kept indefinitely and never reused.
<retirement_age>
sets
the maximum amount of time in minutes that a cache may be unused by any thread
before it is considered due for retirement. As threads are created and exit, caches due for
retirement are freed back to their arena. The allowable values of
<retirement_age>
range from 0 to 1440 minutes (=24*60, i.e. one day). If
<retirment_age>
is 0, retirement
is disabled and unused caches will be kept indefinitely. It is recommended that
<retirement_age>
be configured to 0 unless space efficiency is important and it is
known that an application will stabilize to a smaller number of threads than its initial
number.
In general, kernel threaded applications that benefit in performance from activating
the small block allocator may also benefit further by activating a modest-sized ordinary
cache, which also activates caching small blocks (from which most of the benefit is
derived). For example, a setting that might be tried to begin with would be:
_M_SBA_OPTS=256:100:8
_M_CACHE_OPTS=100:20:0
The smallest ordinary cache that is legal and will activate small block caching (if the SBA
is also configured) is
_M_CACHE_OPTS=1:8:0
It can happen that activating small block caching with this minimum level of ordinary
cache gives all the performance benefit that can be gained from malloc cache, and
increasing the ordinary block cache size further does not improve matters. Or, increasing
cache size further may give some further improvement for a particular application.
The malloc() per-thread cache is a heuristic which may or may not benefit a given
kernel-threaded application that makes intensive use of malloc. Only by trying different
configurations can you determine whether any speed improvement can be obtained from
per-thread cache for a given application, and what the optimal tuning is for that
application.
Impact No impact on performance if cache is not configured or if application is not
kernel-threaded. There are possible significant speed performance improvements for
some kernel applications if cache is configured.
There is a small additional space cost (in process heap size) associated with the cache
machinery. There is no per-block space cost for caching small blocks. However, there is a