HP-UX 11i September 2001 Release Notes

ManualsBrandsHP ManualsSoftwareHP-UX 11i v1.6 Technical Computing (TCOE) LTU

201

202

203

204

205

206

207

208

209

210

Programming

Libraries

Chapter 13

205

<bucket_size>

is (roughly) the numberof cached ordinary blocks per bucket that will be

held in the ordinary block cache. The allowable values range from 0 through 8*4096 =

32768. If

<bucket_size>

is set to 0, cache is disabled.

is the number of power of 2 buckets that will be maintained per thread. The

allowable values range from 8 though 32. This value controls the size of the largest

ordinary block that can be cached. For example, if

is 8, the largest ordinary

block that can be cached will be 2^8 or 256 bytes. If

is 16, the largest

ordinary block that can be cached will be 2^20 or 65536 bytes, etc.

<bucket_size>

is (exactly) the maximum number of ordinary blocks that

will be cached per thread. There is no maximum number of small blocks that will be

cached per thread if the small block cache is active.

<retirement_age>

controls what happens to unused caches. It may happen that an

application has more threads initially than it does later on. In that case, there will be

unused caches, because caches are not automatically freed on thread exit -- by default

they kept and assigned to newly-created threads. But for some applications, this could

result in some caches being kept indeﬁnitely and never reused.

<retirement_age>

sets

the maximum amount of time in minutes that a cache may be unused by any thread

before it is considered due for retirement. As threads are created and exit, caches due for

retirement are freed back to their arena. The allowable values of

<retirement_age>

range from 0 to 1440 minutes (=24*60, i.e. one day). If

<retirment_age>

is 0, retirement

is disabled and unused caches will be kept indeﬁnitely. It is recommended that

<retirement_age>

be conﬁgured to 0 unless space efﬁciency is important and it is

known that an application will stabilize to a smaller number of threads than its initial

number.

In general, kernel threaded applications that beneﬁt in performance from activating

the small block allocator may also beneﬁt further by activating a modest-sized ordinary

cache, which also activates caching small blocks (from which most of the beneﬁt is

derived). For example, a setting that might be tried to begin with would be:

_M_SBA_OPTS=256:100:8

_M_CACHE_OPTS=100:20:0

The smallest ordinary cache that is legal and will activate small block caching (if the SBA

is also conﬁgured) is

_M_CACHE_OPTS=1:8:0

It can happen that activating small block caching with this minimum level of ordinary

cache gives all the performance beneﬁt that can be gained from malloc cache, and

increasing the ordinary block cache size further does not improve matters. Or, increasing

cache size further may give some further improvement for a particular application.

The malloc() per-thread cache is a heuristic which may or may not beneﬁt a given

kernel-threaded application that makes intensive use of malloc. Only by trying different

conﬁgurations can you determine whether any speed improvement can be obtained from

per-thread cache for a given application, and what the optimal tuning is for that

application.

Impact No impact on performance if cache is not conﬁgured or if application is not

kernel-threaded. There are possible signiﬁcant speed performance improvements for

some kernel applications if cache is conﬁgured.

There is a small additional space cost (in process heap size) associated with the cache

machinery. There is no per-block space cost for caching small blocks. However, there is a