Use of Serviceguard Extension for RAC Toolkit with Oracle RAC 10g Release 2 or later, March 2009
8
2. Next, SGeRAC package manager shuts down Oracle Clusterware via the Oracle Clusterware
MNP, followed by the storage needed by Oracle Clusterware (this requires subsequent shutdown
of mount point and disk group MNPs in the case of the storage needed by Oracle Clusterware
being managed by CFS). It can do this since the dependent RAC database instance MNP is
already down. Before shutting itself down, Oracle Clusterware shuts down the ASM instance if
configured, and then the node applications.
3. Lastly, SGeRAC itself shuts down.
Note that the stack can be brought up or down manually, package by package, by using
cmrunpkg/cmhaltpkg in the proper dependency order. To disable (partially or wholly) automatic
startup of the stack when a node joins the cluster, the AUTO_RUN attribute should be set to NO on
the packages that should not automatically be started.
How Serviceguard Extension for RAC starts, stops and checks Oracle
Clusterware
Having discussed how the toolkit manages the overall control flow of the combined stack during
startup and shutdown, we will now discuss how the toolkit interacts with Oracle Clusterware and RAC
database instances. We begin with the toolkit interaction with Oracle Clusterware.
The MNP for Oracle Clusterware provides start and stop functions for Oracle Clusterware and has a
service for checking the status of Oracle Clusterware.
The start function starts Oracle Clusterware using crsctl start crs. To ensure successful startup of Oracle
Clusterware, the function then, every 10 seconds, runs crsctl check until the command output indicates
that the CSS, CRS, and EVM daemons are healthy. If Oracle Clusterware does not start up
successfully, the start function will execute the loop until the package start timer expires, causing
SGeRAC to fail the instance of the Oracle Clusterware MNP on that node.
The stop function stops Oracle Clusterware using crsctl stop crs. Then, every 10 seconds, it runs ps
until the command output indicates that the processes called evmd.bin, crsd.bin, and ocssd.bin no
longer exist.
The check function runs ps to determine process id of the process called ocssd.bin. Then, in a
continuous loop driven by a configurable timer, it uses kill -s 0 to check if this process exists. The
other daemons are restarted by Oracle Clusterware, so they are not checked.
When Oracle Clusterware MNP is in maintenance mode, the check function pauses the Oracle
Clusterware health checking. Otherwise, if the check function finds that the process has died, it means
that Oracle Clusterware has either failed or been inappropriately shut down, that is, without using
cmhaltpkg. The service that invokes the function fails at this point and the SGeRAC package manager
fails the corresponding Oracle Clusterware MNP instance.
How Serviceguard Extension for RAC Toolkit starts, stops and checks the
RAC database instance
Next we discuss the toolkit interaction with the RAC database.
The MNP for the RAC database instance provides start and stop functions for the RAC database
instance and has a service for checking the status of the RAC database instance.
The start function executes su to the Oracle software owner user id. It then determines the Oracle
instance id
8
on the current node for the specified database using srvctl status database. Then it starts
the corresponding RAC database instance using srvctl start instance. If an Oracle Clusterware
8
By determining the Oracle instance ID dynamically, we avoid having to specify it in a configuration file. This has the advantage of allowing the
toolkit configuration to be the same on all nodes.