User's Guide
failed but not halted, the Site Controller Package fails over to a remote site node to perform a site
failover.
Before starting the complex-workload packages configured at the remote site, the Site Controller
Package ensures that it is safe to do so. The failed complex-workload packages might not have
halted cleanly, leaving stray processes and resources. In such scenarios, it is not safe to start the
identical complex workload configuration on the remote site. As a result, when it starts on the
remote site node, the Site Controller Package checks whether all instances of the failed active
packages have halted cleanly. The Site Controller Package checks the last_halt_failed flag for each
instance of the workload packages. The flag is set to yes for an instance whose halt script execution
resulted in an error. Even if one instance of any of the failed workload's packages did not halt
successfully, the Site Controller Package aborts site failover. In these circumstances, the Site
Controller Package halts and its state is displayed as failed on the remote site node. To restart the
Site Controller Package and the complex workload configuration, the nodes on the site need to
be manually cleaned.
After ensuring a clean halt for all instances of the failed complex-workload packages, the Site
Controller Package performs the following steps to activate the corresponding passive complex
workload configured in its current site:
1. Closes the Site Safety Latch for the failed complex-workload package nodes.
2. Waits for all configured packages as part of the failed complex-workload package to halt
successfully.
3. Deports the CVM disk groups used by the database on the failed site.
4. Prepares the replicated data storage on the current site using the Metrocluster environment
file on the node it is starting.
5. Imports the CVM disk groups used by the database in the current site.
6. Opens the Site Safety Latch in the current site.
7. Starts the complex-workload packages configured for the database in the current site.
For the Site Controller Package to successfully start the remote complex-workload package
configuration, the packages in the remote configuration must have node switching enabled on
their configured nodes. When the Site Controller Package fails to start after successfully preparing
the storage on a site, it sets the Site Safety Latch to a transient state, which is displayed as
INTERMEDIATE. When the Site Safety Latch is in the INTERMEDIATE state, the corresponding Site
Controller Package can be restarted only after cleaning the site where it previously failed to start.
For more information on cleaning the Site Controller Package, see “Cleaning the Site to Restart
the Site Controller Package” (page 405)
Node Failure and Rejoining the Cluster
When a node in a cluster fails, all Multi-node packages (MNP) instances running on the failed
node will also fail. The failover type packages failover to the next available adoptive node. If no
other adoptive node is configured and available in the cluster, the failover package fails and is
halted.
When a node in the Metrocluster environment is restarted, the active complex-workload packages
on the node are halted before the node restarts. Once the node is restarted and joins a cluster,
the active complex-workload package instances on the site with the auto_run flag set to yes
automatically start. If the complex workload's packages have the auto_run flag set to no, these
instances must be manually started on the restarted node.
When a node, on which the Site Controller Package is running, is restarted, the Site Controller
Package fails over to the next available adoptive node. Based on the site adoptive node that the
Site Controller Package is started on and the status of the active complex-workloads packages,
the Site Controller Package performs a site failover, if necessary.
392 Designing a Disaster Recovery Solution Using Site Aware Disaster Tolerant Architecture