Guidelines for Configuring Virtual Partitions on Cellular Platforms A Technical White Paper Abstract ........................................................................................................................................ 2 Introduction ................................................................................................................................... 2 Configuration Flexibility Versus Resource Locality ........................................................................
Abstract The HP-UX virtual partitions (vPars) solution is one of the popular choices available to customers who are considering workload consolidation on nPartitions (hard partitions on cellular HP platforms). The vPars solution provides users with configuration flexibility and negligible performance overhead. It is however important from a performance perspective to understand and account for the localities of the resources assigned to virtual partitions hosted on cellular platforms.
Configuration Flexibility Versus Resource Locality Any valid virtual partition needs a set of resources assigned to it to enable it to host an HPUX Operating Environment and workload(s). The following is a brief overview of the types of choices available for picking resources for assignments to virtual partitions or setting attributes at the nPartition and the virtual partition level. The granularity of processor assignment to a virtual partition is a CPU core.
To address the (a), tools like HP Integrity Essentials Capacity Advisor can be used to gather resource usage profile data for the individual workloads in a non-consolidated environment and to analyze them to help predict the amount of resources needed for each workload on a target consolidated environment.
The benchmark produces two metrics, the maximum number of SD users a system can handle (while keeping average response time below two seconds), and SAPS. The SAPS (SAP Application Performance Standard) is a metric used to measure the throughput of a SAP application server. For the SAP SD benchmark, 100 SAPS equal 2000 fully processed order line items per hour. A higher SAPS rating indicates better system performance and thus an ability to accommodate more users.
During a SPECjbb run, the benchmark is run repeatedly with an increasing number of warehouses starting with a specified number and ending with twice that number. The average throughput of all the runs is used as the performance measure. The unit of measure, BOPS, stands for business operations per second. For this focused study, the benchmark was configured with 16 warehouses.
It is worth pointing out that the objective of this focused study was not to determine the upper bounds on the performance impacts for all classes of workloads in all possible virtual partition and nPartition configurations. We ran these two workloads and gathered metrics on different virtual partition configurations each with the same amount of resources (one cell’s worth of: processors, memory and associated I/O for the workload being run).
• • 1A (nPartition): One-cell nPartition (8 cores, 32GB ILM) 1B (virtual partition): One virtual partition/one-cell nPartition (8 cores, 32GB ILM2, 100% Base memory, 1024MB granule size) We also did additional runs to study the impact of the memory granule size and the impact of base versus floating memory: • • • • 2 100% Base with128MB granule size 100% Base with 4096MB granule size 1024MB granule size with 25% Base and 75% Float 1024MB granule size with 50% Base and 50% Float Some amount of memory in
Layout 2: Single Virtual Partition in a Two-Cell nPartition with Different Resource Layouts (ILM Only) • 2A – One virtual partition / two-cell nPartition (4 cores from each cell , 32GB ILM, 100% Base memory, 1024MB granule size ) • 2B – One virtual partition / two-cell nPartition (8 cores from a cell that has the workload’s I/O, 32GB ILM, 100% Base memory, 1024MB granule size) 9
Layout 3: Single Virtual Partition in a Two-Cell nPartition with Different Resource Layouts (CLM and ILM) • 3A – One virtual partition / two-cell nPartition (4 cores from each cell, 24GB CLM (on the cell with workload’s I/O), 8GB ILM, 100% Base memory, 1024MB granule size) • 3B – One virtual partition / two-cell nPartition (8 cores and 24GB CLM (on the cell with workload’s I/O) , 8GB ILM, 100% Base memory, 1024MB granule size) • 3C – One virtual partition / two-cell nPartition (8 cores (on the cell wi
Layout 4: Two Virtual Partitions with Workloads Running Simultaneously Users typically configure multiple virtual partitions, each hosting its own workload, in an nPartition. Also, the resources assigned to the various virtual partitions in the nPartition can come from different cells in the nPartition. As a next step we configured two virtual partitions on this two-cell nPartition.
• 4B – Two virtual partitions / two-cell nPartition (each virtual partition has 24GB CLM (on the cell with workload’s I/O), 8 cores from remote cell, 8GB ILM, 100% Base memory, 1024MB granule size) • 4C – Two virtual partitions / two-cell nPartition (each virtual partition has: 4 cores/cell , 32GB ILM, 100% Base memory, 1024MB granule size) Analysis of Results The configuration 1A (one-cell nPartition) is used as a baseline.
SPECjbb 1.02 1 0.99 1 0.99 normalized bops 0.98 0.96 0.94 0.93 0.94 0.93 0.92 0.9 0.9 0.88 0.86 0.84 1A(nPar) 1B(vPar) 2A 2B 3A 3B 3C configurations In configurations 3A, 3B and 3C we have 75% percentage of memory configured as cell local memory and the remaining 25% as interleaved memory. The influence of the locality of the CPU core assignments with respect to the associated CLM is clearly shown here.
SAP SD 2-tier 1.02 1 1 0.99 0.99 normalized SAPs 0.98 0.96 0.96 0.94 0.93 0.92 0.92 0.91 0.9 0.88 0.86 1A(nPar) 4A,3B 4B,3C 4C,2A configurations SPECjbb 1.02 1 0.99 1 0.99 normalized bops 0.98 0.96 0.94 0.92 0.92 0.93 0.9 0.9 0.88 0.88 0.86 0.84 0.
The percentage of base versus floating memory in the virtual partition (as long as the virtual partition had the prescribed minimum amount of base memory [4]) did not seem to make any noticeable impact on the performance of these two workloads. Other workloads may exhibit different sensitivities to the choice of the memory granule size and the percentages of base versus floating memory in a virtual partition.
5. CLM and CLPs. vPars customers are encouraged to assign CLM and CLPs for each of their virtual partitions6. It is better to configure some amount of base CLM from a cell if the virtual partition is going to have processors and I/O on that cell, as this will allow the kernel to allocate the I/O related data structures in the same locality. When configuring CLM and CLP for each virtual partition, refer to the HP-UX NUMA documentation to decide on how to optimize the performance [1,2]. 6. Multi-core CPUs.
Conclusion The results from the experiments described in this paper have shown that the vPars software stack overhead is very minimal. However, due to the NUMA nature of the underlying cellular hardware platforms and the performance sensitivity of workloads to resource localities it is important to pay attention to the localities of the resources assigned to the various virtual partitions within an nPartition.
Reference 1. See Chapter 12 in HP-UX 11i Version 2 September 2004 Release Notes: HP 9000 Servers, HP Integrity Servers, and HP Workstations located at http://docs.hp.com/en/5990-8153/index.html . 2. See http://docs.hp.com/en/4913/ccNUMA_White_Paper.pdf for the white paper titled “ccNUMA Overview”. 3. See http://docs.hp.com/en/8767/cpu_config.pdf for the white paper titled “CPU Configuration Guidelines for vPars”. 4. See http://docs.hp.com/en/9832/vParsMemMigration.