Using Serviceguard Extension for RAC, 10th Edition, April 2011

Next, SGeRAC package manager shuts down Oracle Clusterware via the Oracle Clusterware
MNP, followed by the storage needed by Oracle Clusterware (this requires subsequent shutdown
of mount point and disk group MNPs in the case of the storage needed by Oracle Clusterware
being managed by CFS). It can do this since the dependent RAC database instance MNP is already
down. Before shutting itself down, Oracle Clusterware shuts down the ASM instance if configured,
and then the node applications. Lastly, SGeRAC itself shuts down.
Note that the stack can be brought up or down manually, package by package, by using
cmrunpkg/cmhaltpkg in the proper dependency order. To disable (partially or wholly) automatic
startup of the stack when a node joins the cluster, the AUTO_RUN attribute should be set to NO on
the packages that should not automatically be started.
How Serviceguard Extension for RAC starts, stops and checks Oracle Clusterware
Having discussed how the toolkit manages the overall control flow of the combined stack during
startup and shutdown, we will now discuss how the toolkit interacts with Oracle Clusterware and
RAC database instances. We begin with the toolkit interaction with Oracle Clusterware.
The MNP for Oracle Clusterware provides start and stop functions for Oracle Clusterware and has
a service for checking the status of Oracle Clusterware.
The start function starts Oracle Clusterware using crsctl start crs. To ensure successful
startup of Oracle Clusterware, the function, every 10 seconds, runs crsctl check until the
command output indicates that the CSS, CRS, and EVM daemons are healthy. If Oracle Clusterware
does not start up successfully, the start function will execute the loop until the package start timer
expires, causing SGeRAC to fail the instance of the Oracle Clusterware MNP on that node.
The stop function stops Oracle Clusterware using crsctl stop crs. Then, every 10 seconds,
it runs ps until the command output indicates that the processes called evmd.bin, crsd.bin,
and ocssd.bin no longer exist.
The check function runs ps to determine process id of the process called ocssd.bin. Then, in a
continuous loop driven by a configurable timer, it uses kill -s 0 to check if this process exists.
The other daemons are restarted by Oracle Clusterware, so they are not checked.
When Oracle Clusterware MNP is in maintenance mode, the check function pauses the Oracle
Clusterware health checking. Otherwise, if the check function finds that the process has died, it
means that Oracle Clusterware has either failed or been inappropriately shut down—without using
cmhaltpkg. The service that invokes the function fails at this point and the SGeRAC package
manager fails the corresponding Oracle Clusterware MNP instance.
How Serviceguard Extension for RAC Mounts, dismounts and checks ASM disk groups
We discuss the toolkit interaction with the ASM disk groups.
The MNP for the ASM diskgroups that are needed by RAC database provides mount and dismount
functions for the ASM diskgroups and has a service for checking the status of those ASM diskgroups
whether they are mounted or not.
The start function executes su to the Oracle software owner user id. It then determines the ASM
instance id on the current node for the specified diskgroup using crsctl status resource
ora.asm. It is stored in variable and used for future references. Then it mounts the ASM disk
groups mentioned in that ASMDG MNP by connecting to ASM instance using sqlplus.
The stop function executes su to the Oracle software owner user id. It unmounts the ASM diskgroups
which are specified in that ASMDG MNP by connecting to ASM instance via sqlplus.
The check function determines the status of the ASM disk groups that are mentioned in ASMDG
MNP. When ASMDG MNP is in maintenance mode, the ASM diskgroup status checking is paused.
Otherwise, in a continuous loop driven by a configurable timer, the check function monitors the
status of the ASM diskgroups mentioned in that ASMDG MNP. If one or more ASM diskgroup is
in a dismounted state, the check function will report failure—the ASM diskgroup is dismounted
Serviceguard Extension for RAC Toolkit operation 87