HP Virtual Server Environment: Tips for Application Developers
max_lcpus = buf.psd_max_proc_cnt;
printf("Max cores: %d; Max LCPUs: %d\n", max_cores, max_lcpus);
}
These maximum counts take into account all possible sources of additional CPU resources, including
the sum of all psets, all available iCAP processors, and all cores that can be migrated from another
vPar. Note that on a Montecito system, the max number of LCPUs will always be twice the max
number of cores, whether HyperThreads are enabled or not. This is because HyperThreads could be
enabled, which means the maximum number must include that possibility.
An application may benefit from having detailed topological information about the system. The APIs
to obtain topology information are:
pstat_getprocessor(2) – used to get topology information on sockets, cores, threads and
logical CPUs and logical domains.
pstat_getdynamic(2) – system-wide counters return information about maximum number of
active and supported processors and cores.
mpctl(2) – provides hierarchical information on sockets, cores, LDOMs, etc.
These APIs should be used sparingly and with care. Much of the information available is unique to a
particular architecture, CPU version or system type. Use of such information may limit the application
to a particular subset of systems. Binding application execution to specific hardware components
may limit the ability of the OS to migrate those hardware components to higher priority partitions.
Conversely, applications bound to specific hardware resources may find those resources unavailable
if they have been re-assigned to another partition.
Tell me when it changes
The ideal design for a VSE enabled application includes dynamic modification in response to
changes in the resources available. Consider this common scenario:
Application starts up, low system load, minimum CPU resources allocated to this partition.
Application queries OS regarding number of cores, allocates two (process) threads per core
to handle workload.
Application begins processing workload, load increases.
Workload manager senses additional load, adds more cores for the application.
Application is unable to use additional cores, due to limited number of threads.
The performance of this application is limited by the number of threads allocated initially. The
number of cores available at allocation time was minimal, because there was no load. To address
this, consider a different solution.
Application starts up, low system load, minimum CPU resources allocated to this partition.
Application queries OS regarding maximum number of cores, allocates two threads per core
to handle workload.
Application begins processing workload, load increases.
Workload manager senses additional load, adds more cores for the application.
Application throughput rises with additional cores since there are enough threads to keep all
the cores busy.
Workload manager takes away cores to give to a higher priority workload.
Application now has more than two threads running per core, causing excess context
switching between threads and resulting in sub-optimal performance.