HP Extended Cluster for RAC Continuous availability with the flexibility of virtualization
complexity that were executed online and queued for deferred execution. TPC-C is widely
acknowledged as providing one of the most realistic loading simulations within a comprehensive
computing environment.
Failover test description
Multiple tests were performed to simulate the wide variety of scenarios that could impact a data
center. Host failure, storage device failure, DWDM link failure, and catastrophic data center failure
were all emulated to stress the configuration in the most rigorous and realistic manner possible. Intra-
data center connections to storage, either through the Interswitch Linking (ISL) switch or directly to an
array, were removed one at a time to simulate localized failures. Characteristics of the configuration’s
response to each event were observed, including the reaction of the Oracle instance, and behavior
during resynchronization of data volumes as connections became reestablished was also noted. Data
integrity and user impact were also closely monitored throughout all phases of testing.
Test results and tuning
The most critical result of the testing is that the 100-km SGeRAC solution, running over DWDM, fully
exhibited the complete set of high-availability characteristics observed on a collocated HP
Serviceguard cluster, allowing the server and application cluster to continue to perform and be
accessible when failure occurred on one or more of the components. Data integrity was maintained
for the duration of all testing.
For host failure emulation, a server was shut down, making the Oracle database components resident
on that system inaccessible for two to three minutes. All traffic was able to move to the second data
center. When restarted, the failed system resumed operations.
A forced single storage device failure had no impact on the Oracle instance. Even though the test was
continued for several minutes, no visible user impact or loss of data integrity was observed.
DWDM link failure did not compromise the cluster, which continued to remain up for the duration of
the test cycle. No discernible adverse behavior in HP Serviceguard and SGeRAC performance or
functionality was noted.
The solution performed well against stringent proprietary and industry-standard benchmark tests,
providing the enterprise with unprecedented levels of disaster tolerance and resource utilization
across the 100-km divide. Even during recovery, the full functionality of the Oracle repository
remained available.
As expected, in the less than optimized configuration, IPC, I/O, and application operations did show
a degree of performance degradation related to separation distance. This result indicates that
assessment of the application and workload characteristics of the target environment are critical to
making the most of the implementation over extended distances. During the evaluation, informal
manipulation clearly demonstrated that tuning and hardware selection can have a significant impact
on overall system performance.
It was found that the assignment of buffer credits within the DWDM devices had a significant positive
impact on throughput performance of the RAC application. Configuring larger buffer credit values can
require more memory in the DWDM device. However, assigning larger buffer credit values helps
keep the pipeline full, thus improving throughput and making better use of the available bandwidth,
especially as the distance between the sites increases. For instance, doubling the buffer credits to 60
resulted in a throughput gain of over 100% at distances of 50 km and 100 km. It is expected that, in
practice, buffer credits much larger than 60 would be needed to take full advantage of the available
bandwidth.
The use of multiple DWDM channels allows each protocol to have its own unimpeded bandwidth,
subject to the total available aggregate bandwidth. Further increases in performance were observed
with the addition of extra interconnects from the servers to the DWDM device so that Cache Fusion
11