HP-UX 11i March 2002 Release Notes

Programming Libraries
Chapter 13
218
speed performance for some kernel-threaded applications, by reducing mutex contention
among threads and by deferring coalescence of blocks.
The thread-private cache is only available for kernel-threaded applications, i.e. those
linked with the pthread library. The installed shared pthread library version must be
PHCO_19666 or later, or the application must be statically linked with an archive
pthread library that is version PHCO_19666 or later, or else cache is not available.
By default cache is not active and must be activated by setting _M_CACHE_OPTS to a
legal value. If _M_CACHE_OPTS is set to any out of range values, it is ignored and cache
remains disabled.
There are two portions to the thread private cache: one for ordinary blocks and one for
small blocks. Small blocks are blocks that are allocated by the small block allocator
(SBA), which is configured with the environment variable _M_SBA_OPTS or by calls to
mallopt(3C). The small block cache is automatically active whenever both the ordinary
block cache and the SBA are active. The ordinary block cache is active only when it is
configured by setting _M_CACHE_OPTS. There are no mallopt() options to configure the
thread-private cache.
The following shows _M_CACHE_OPTS’s subparameters and their meaning:
_M_CACHE_OPTS=
<bucket_size>
:
<buckets>
:
<retirement_age>
<bucket_size>
is (roughly) the numberof cached ordinary blocks per bucket that will be
held in the ordinary block cache. The allowable values range from 0 through 8*4096 =
32768. If
<bucket_size>
is set to 0, cache is disabled.
<buckets>
is the number of power of 2 buckets that will be maintained per thread. The
allowable values range from 8 though 32. This value controls the size of the largest
ordinary block that can be cached. For example, if
<buckets>
is 8, the largest ordinary
block that can be cached will be 2^8 or 256 bytes. If
<buckets>
is 16, the largest
ordinary block that can be cached will be 2^20 or 65536 bytes, etc.
<bucket_size>
*
<buckets>
is (exactly) the maximum number of ordinary blocks that
will be cached per thread. There is no maximum number of small blocks that will be
cached per thread if the small block cache is active.
<retirement_age>
controls what happens to unused caches. It may happen that an
application has more threads initially than it does later on. In that case, there will be
unused caches, because caches are not automatically freed on thread exit -- by default
they kept and assigned to newly-created threads. But for some applications, this could
result in some caches being kept indefinitely and never reused.
<retirement_age>
sets
the maximum amount of time in minutes that a cache may be unused by any thread
before it is considered due for retirement. As threads are created and exit, caches due for
retirement are freed back to their arena. The allowable values of
<retirement_age>
range from 0 to 1440 minutes (=24*60, i.e. one day). If
<retirment_age>
is 0, retirement
is disabled and unused caches will be kept indefinitely. It is recommended that
<retirement_age>
be configured to 0 unless space efficiency is important and it is
known that an application will stabilize to a smaller number of threads than its initial
number.
In general, kernel threaded applications that benefit in performance from activating
the small block allocator may also benefit further by activating a modest-sized ordinary
cache, which also activates caching small blocks (from which most of the benefit is
derived). For example, a setting that might be tried to begin with would be:
_M_SBA_OPTS=256:100:8
_M_CACHE_OPTS=100:20:0