HP-UX Operating System: Fault Tolerant System Administration HP-UX version 11.00.
Notice The information contained in this document is subject to change without notice. UNLESS EXPRESSLY SET FORTH IN A WRITTEN AGREEMENT SIGNED BY AN AUTHORIZED REPRESENTATIVE OF STRATUS TECHNOLOGIES, STRATUS MAKES NO WARRANTY OR REPRESENTATION OF ANY KIND WITH RESPECT TO THE INFORMATION CONTAINED HEREIN, INCLUDING WARRANTY OF MERCHANTABILITY AND FITNESS FOR A PURPOSE.
Contents Preface Revision Information Audience Notation Conventions Product Documentation Online Documentation Notes Files Man Pages Related Documentation Ordering Documentation Commenting on This Guide Customer Assistance Center (CAC) xiii xiii xiii xvi xvii xvii xvii xviii xix xix xix 1.
Contents iv 3.
Contents 5.
Contents Dump Configuration Decisions and Dump Space Issues 5-42 Dump Space Needed for Full System Dumps 5-44 Dump Space Needed for Selective Dumps 5-44 Configuring save_mcore 5-45 Using save_mcore for Full and Selective Dumps 5-45 Configuring a Dump Device for savecrash 5-47 Configuring a Dump Device into the Kernel 5-47 Using SAM to Configure a Dump Device 5-47 Using Commands to Configure a Dump Device 5-48 Modifying Run-Time Dump Device Definitions 5-49 Defining Entries in the fstab File 5-49 Using cras
Contents 7. Remote STREAMS Environment Configuration Overview Configuring the Host Creating the orsdinfo File Updating the RSD Configuration Customizing the orsdinfo File Defining the Location for the Firmware Downloading Firmware Downloading New Firmware Downloading Firmware to a Card Setting and Getting Card Properties Adding or Moving a Card 7-1 7-1 7-3 7-3 7-5 7-6 7-6 7-7 7-7 7-8 7-8 7-9 Appendix A.
Figures Figure 3-1. Figure 3-2. Figure 3-3. Figure 4-1. Figure 5-1. Figure 5-2. Figure 5-3. Figure 5-4. Figure 5-5. Figure 5-6. Figure 5-7. Figure 5-4. Figure 5-5. Figure 5-8. Figure 5-9. Figure 6-1. Figure 7-1.
Tables Table 1-1. Table 3-1. Table 3-2. Table 3-3. Table 3-4. Table 3-5. Table 3-6. Table 3-7. Table 3-8. Table 3-9. Table 3-10. Table 3-11. Table 5-1. Table 5-2. Table 5-3. Table 5-6. Table 5-7. Table 5-8. Table 5-9. Table 5-10. Table 5-11. Table 5-12. Table 6-1. Table 6-2. Table 6-3. Table 6-4. Table 7-1. Table 7-2. Table B-1.
Preface Preface The HP-UX Operating System: Fault Tolerant System Administration (R1004H) guide describes how to administer the fault tolerant services that monitor and protect Continuum systems. Revision Information This manual has been revised to reflect support for Continuum systems using suitcases with the PA-8600 CPU modules, additional PCI card and storage device models, company and platform1 name changes, and miscellaneous corrections to existing text.
Notation Conventions ■ The following font conventions apply both to general text and to text in displays: – Monospace represents text that would appear on your screen (such as commands and system responses, functions, code fragments, file names, directories, prompt signs, messages). For example, Broadcast Message from ... – Monospace bold represents user input in screen displays. For example, ls -a – Monospace italic represents variables in commands for which the user must supply an actual value.
Notation Conventions ■ Ellipses (…) indicate that you can enter more than one of an argument on a single command line. For example, cb [–s] [–j] [–l length] [–V] [file …] ■ A right-arrow (>) on a sample screen indicates the cursor position. For example, >install - Installs Package ■ A name followed by a section number in parentheses refers to a man page for a command, file, or type of software.
Product Documentation WARNING Warning notices alert the reader to conditions that are potentially hazardous to people. These hazards can cause personal injury if the warnings are ignored. DANGER Danger notices alert the reader to conditions that are potentially lethal or extremely hazardous to people.
Product Documentation Online Documentation When you install the HP-UX operating system software, the following online documentation is installed: ■ notes files ■ manual (man) pages Notes Files The /usr/share/doc/RelNotes.fts file contains the final information about this product. The /usr/share/doc/known_problems.fts file documents the known problems and problem-avoidance strategies. The /usr/share/doc/fixed_list.fts file lists the bugs that were fixed in this release.
Product Documentation Related Documentation In addition to the operating system manuals, the following documentation contains information related to administering a Continuum system running the HP-UX operating system: ■ The Continuum Series 400 and 400-CO: Site Planning Guide (R454) provides a system overview, site requirements (for example, electrical and environmental requirements), cabling and connection information, equipment specification sheets, and site layout models that can assist in your site pr
Customer Assistance Center (CAC) Ordering Documentation HP-UX operating system documentation is provided on CD-ROM (except for the Managing Systems and Workgroups (B2355-90157) which is available as a separate printed manual). You can order a documentation CD-ROM or other printed documentation in either of the following ways: ■ Call the CAC (see “Customer Assistance Center (CAC)”). ■ If your system is connected to the Remote Service Network (RSN), add a call using the Site Call System (SCS).
1 Getting Started 1- This chapter provides you with information about using this manual and describes continuous-availability administration and fault-tolerant design. Using This Manual Stratus versions of the HP-UX operating system has been enhanced for use with the Continuum fault tolerant hardware, communication adapters, peripherals, and associated software.
Using This Manual For many of your system administration tasks, you can refer to the standard HP-UX operating system manuals provided by Hewlett-Packard. Table 1-1 provides a list of administrative task and where to find the information. Table 1-1. Where to Find Information For information about . . . Refer to . . .
Using This Manual Table 1-1. Where to Find Information (Continued) For information about . . . Refer to . . .
Continuous Availability Administration Continuous Availability Administration This section describes a Continuum system’s unique continuous-availability architecture and provides an overview of the special tasks system administrators must perform to support and monitor this architecture.
Continuous Availability Administration Console Controller Continuum systems do not include a control panel or buttons to execute machine management commands. All such actions are controlled through the system console, which is connected to the console controller. The console controller serves the following purposes: ■ The console controller implements a console command interface that allows you to initiate certain actions, such as a shutdown or main bus reset.
Fault Tolerant Design Fault Tolerant Design Continuum systems are fault tolerant; that is, they continue operating even if major components fail. Continuum systems provide both hardware and software features that maximize system availability.
Fault Tolerant Design ■ Continuum systems contain multiple fans and environmental monitoring features. Power and air flow information is collected automatically and corrective actions are initiated as necessary. Continuous Availability Software The fault tolerant software features include the following: ■ Stratus provides a layer of software fault tolerant services with the standard HP-UX operating system. These services constantly monitor for and respond to hardware problems.
Fault Tolerant Design board resources (for example, SCSI ports on I/O controllers) or software configuration of board resources (for example, using RNI to configure dual Ethernet ports). ■ buses—In Continuum Series 400/400-CO systems, the suitcases and PCI bridge cards are cross-wired on the main bus to provide fault tolerance. The combination of error detection, retry logic, and bus switching ensures that all bus transactions are fault tolerant.
2 Setting Up the System 2- A system administrator’s job is to provide and support computer services for a group of users.
Installing a System Installing a System Continuum systems are installed by Stratus representatives who can guide you in setting up your system. Nevertheless, all administrators should expect to allocate time to site planning and installation. 1. Prepare your site prior to system delivery.
Configuring a System ■ creating file systems ■ configuring mail and print services ■ setting up NFS services ■ setting up network services ■ backing up and restoring data ■ setting up a workgroup See the Managing Systems and Workgroups (B2355-90157) for detailed information about administering a system running the HP-UX operating system. (Hewlett-Packard offers additional manuals that describe how to set up and manage networking and other services.
Configuring a System ■ Modify, as necessary, boot parameters. The system installs with a default set of boot parameters in the /stand/conf file. If conditions warrant, you can modify those parameters, for example, to specify a new root device. See Chapter 3, “Starting and Stopping the System,” and the conf(4) man page for more information. ■ Configure, if necessary, logical LAN interfaces.
Maintaining a System Maintaining a System An active system requires regular monitoring and periodic maintenance to ensure proper security, adequate capability, and optimal performance. The following are guidelines for maintaining a healthy system: ■ Set up a regular schedule for backing up (copying) the data on your system. Decide how often you must back up various data objects (full file systems, partial file systems, data partitions, and so on) to ensure that lost data can always be retrieved.
Maintaining a System Tracking and Fixing System Problems An important function of a system administrator is to identify and fix problems that occur in the hardware, software, or network while the system is in normal use. Continuum systems are designed specifically for continuous availability, so you should experience fewer system problems than with other systems running the HP-UX operating system.
3 Starting and Stopping the System 3- This chapter provides an overview of the boot process and describes the following tasks: ■ configuring the boot environment ■ booting the system ■ shutting down the system ■ dealing with power failures ■ managing flash cards Overview of the Boot Process Bringing the system from power up to a point where users can log in is the process of booting.
Overview of the Boot Process Boot Process User Prompts Power on (or reset_bus from “Hit any key...” NO CPU PROM Path partition set YES NO Press key YES PROM: (optional commands) lynx$ (optional commands) Primary boot loader NO “ISL: Hit any key...” YES Secondary boot loader ISL> (optional commands) boot messages login Figure 3-1. Boot Process 3-2 Fault Tolerant System Administration (R1004H) HP-UX version 11.00.
Overview of the Boot Process Once the system powers up (or you enter a reset_bus from the console command menu), the following steps occur: 1. The CPU PROM begins the boot sequence, and the system displays various messages (for example, copyright, model type, memory size, and board revision) and the following prompt: Hit any key to enter manual boot mode, else wait for autoboot 2.
Configuring the Boot Environment NOTE Before you power up the computer, turn on the console, terminals, and any other peripherals and peripheral buses that are attached to the computer. If you do not turn on the peripherals first, the system will not be able to configure the bus or peripherals. When the peripherals are on and have completed their self-check tests, turn on the computer.
Configuring the Boot Environment To change the boot path or disable autoboot, do the following: 1. Log in as root. 2. Determine which console controller is on standby. To do this, enter ftsmaint ls 1/0 ftsmaint ls 1/1 The Status field shows Online for the online board and Online Standby for the standby board (if both boards are functioning properly). NOTE You must specify the standby console controller for any PROM-burning commands. You will get an error if you specify the online console controller.
Configuring the Boot Environment a. Edit the /stand/bootpath file and enter appropriate entries for the boot device(s). Each line presents one boot device, and you can enter up to four lines. The system searches for a boot device in the order entered in the file. The following are sample entries: 2 0 0 0 3 0 0 0 b. Update the path partition with the information from the /stand/bootpath file.
Configuring the Boot Environment the LIF kernel file (kernel), and some logical SCSI buses (lsm#). Although the file you select during installation as the default CONF file is adequate in many settings, you might need to modify the CONF parameters if: ■ You reconfigure your system and want to specify an alternate root device. ■ You add RNI support and need to configure logical LAN interfaces (see the HP-UX Operating System: LAN Configuration Guide (R1011H) and the HP-UX Operating System: RNI (R1006H)).
Configuring the Boot Environment save_mcore_dumps_only=1 disk_sys_type=euroac lsm0=0/2/7/1,0/3/7/1:id0=7,id1=6,tm0=0,tp0=1,tm1=0,tp1=1 lsm1=0/2/7/2,0/3/7/2:id0=7,id1=6,tm0=0,tp0=1,tm1=0,tp1=1 lsm2=0/2/7/0:id0=7,tm0=1,tp0=1 lsm3=0/3/7/0:id0=7,tm0=1,tp0=1 ■ The following is a sample of the CONF_EURDC file for a Continuum Series 400-CO system with the DC powered Eurologic disk enclosure: rootdev=disc(14/0/0.0.
Configuring the Boot Environment 4. Remove the current CONF file. To do this, enter flifrm flashcard:CONF 5. Copy the updated /stand/conf file to the CONF file. To do this, enter flifcp /stand/conf flashcard:CONF 6. Reboot the system to activate the new settings. To do this, enter shutdown -r See “Flash Card Utility Commands” later in this chapter for a complete list of commands that you can use to check or manipulate LIF files.
Configuring the Boot Environment CPU PROM Commands Table 3-2 lists the CPU PROM commands you can enter at the PROM: prompt. Table 3-2. CPU PROM Commands Command Meaning boot location Starts the boot process; location is the physical location of the boot device (see “Manually Booting Your System”). list_boards Lists the boards on the main system bus. display addr bytes Displays current memory. addr is the starting memory address and bytes is the memory size (number of bytes) to display.
Configuring the Boot Environment Primary Bootloader Commands Table 3-3 lists the primary bootloader commands you can enter at the lynx$ prompt. See the lynx(1M) man page for more information. Table 3-3. Primary Bootloader Commands Command Meaning boot [options] go [options] Loads an object file from the LIF file system on the flash card or boot disk and transfers control to the loaded image.
Configuring the Boot Environment The boot command has several options. The command syntax is as follows: boot [-F] [-lq] [-P number] [-M number] [-lm] [-s file] [-a[C|R|S|D] devicefile] [-f number] [-i string] Table 3-4 lists the boot command options. Table 3-4. Options to the boot Command Command Meaning -F Use with the SwitchOver/UX software. Ignore any locks on the boot disk. This option should be used only when it is known that the processor holding the lock is no longer running.
Configuring the Boot Environment Table 3-4. Options to the boot Command (Continued) Command Meaning -a [C|R|S|D] devicefile Accept a new location as specified by devicefile and pass it to the loaded image. If that image is a kernel, the kernel erases its current I/O configuration and uses the specified devicefile. If the C, R, S, or D option is specified, the kernel configures the devicefile as the console, root, swap, or dump device, respectively. The -a option can be repeated multiple times.
Configuring the Boot Environment Table 3-5. Boot Environment Variables (Continued) Parameter Meaning dpt1port Specifies the location of a single-port SCSI controller card(s). The dpt1port parameter allows a comma separated list of hardware locations in the form x/y where x is the bus number and y is the slot number. For example, dpt1port=2/6,3/6 specifies that there are single-port SCSI controller cards in slot 6 of PCI bay 2 and 3. dumpdev Specifies the dump device for the system.
Configuring the Boot Environment Table 3-5. Boot Environment Variables (Continued) Parameter Meaning rootdev Specifies the root device for the system. The rootdev parameter is a devicefile specification. See “Modifying CONF Variables” for the format of devicefile. swapdev Specifies the swap device for the system. The swapdev parameter has the form (v/w/x.y.z;n) where v/w/x.y.z specifies the hardware path to the swap device and n is the minor number (n is always 0). The default is (;).
Booting the System Booting the System Your choice of how to boot the system depends on the state of the machine. In general, there are three states from which you need to initiate the boot process, as described in Table 3-7. Table 3-7. Booting Options Machine State Booting Method no power If the system is not powered because the power source was interrupted (or if this is the initial power-on), regaining power initiates the boot process.
Booting the System Conditions might require that you reboot in a special way, such as in single-user mode or with an alternate kernel. Table 3-8 provides guidelines to consider before rebooting. Table 3-8. Booting Sources Boot this way . . . If . . . In single-user state • • You forgot the root password. /etc/passwd or /etc/inittab is corrupt. With an alternate kernel • • The system does not boot after reconfiguring the kernel. The default kernel returns the error “Cannot open or execute.
Booting the System When the console is in command mode, it displays a menu similar to the following: help ......... shutdown ..... restart_cpu .. reset_bus .... hpmc_reset ... history ...... quit, q ...... . .......... 2. displays command list. begin orderly system shutdown. force CPU into kernel dump/debug mode. send reset to system. send HPMC to cpus. display switch closure history. exit the front panel command loop. display firmware version.
Booting the System Table 3-9. Console Commands (Continued) Command Description hpmc_reset Issues a high priority machine check (HPMC) to all CPUs on all CPU/memory boards in the system. This command first flushes the caches to preserve dump information and then (based on an internal flag value) either invokes a “warm” reset (that is, reboots the system, saving current memory and registers) or simply returns to the HP-UX operating system.
Booting the System 2. The system displays a PROM: prompt. At this prompt, invoke the primary bootloader. To do this, enter PROM: boot location location is the boot device location. Enter a flash card location from which to boot. For example, to boot from the flash card in card-cage 2, enter PROM: boot 2 For a list of PROM commands, enter help at the PROM: prompt. For more information, see “CPU PROM Commands.” 3.
Booting the System NOTE File system used during recovery is /stand/flash/INSTALLFS. Configuration information available at boot is stored in the first 8KB of this file. The INSTALL kernel used during installation is /stand/vmunix. For more information, see the make_boot_image(1M), make_recovery(1M), instl_adm(1M), instl_adm(4), and ignite(5) man pages. The following sections describe the procedures for creating the boot image and doing a recovery.
Booting the System If the root disk is very large, then you should use make_recovery without the -A option to backup the core operating system and use your regular backup procedure to backup other files. Also, you can customize exactly which files are to be put on the recovery tape by using the -p and -r options. 5. Remove the tape. Label it as the recovery tape for that system and date it. NOTE The recovery tape should be updated whenever your system changes.
Shutting Down the System 5. Set the kernel environment variable to INSTALL. Enter the following command: kernel=INSTALL NOTE The ls command can be used when booting from flash card to see the contents of the card. 6. To continue to boot, enter the following command: lynx$ go The secondary boot loader will be loaded. Let the boot process continue without interruption until you see the ISL prompt. 7.
Shutting Down the System Using SAM To shut down the system using SAM, do the following: 1. Log in as root. 2. Invoke SAM. To do this, enter sam 3. Select the Routine Tasks icon or menu option. 4. Select the System Shutdown icon or menu option. 5. Select the type of shutdown you want: – Halt the system – Reboot (restart) the system – Go to single-user state 6. In the Time Before Shutdown control box, enter the number of minutes before shutdown will begin and select OK. 7.
Shutting Down the System Changing to Single-User State To change to a single-user state, do the following: 1. Change to the / (root) directory. To do this, enter cd / 2. Shut down the system. To do this, enter shutdown The system prompts you to send a message informing users how much time they have to end their sessions and when to log off. 3. At the prompt for sending a message, enter y. 4. Enter a message. 5. When you finish entering the message, press and then -.
Shutting Down the System System Jul 20 Jul 20 Jul 20 shutdown time has arrived 16:48:03 automount[457]: exiting 16:48:03.17 [FTS,c0] (0/0) ftsarg = 401! 16:48:09.43 [FTS,c0] (0/0) ftsarg = 401! sync’ing disks (0 buffers to flush): 0 buffers not flushed 0 buffers still dirty Stratus Continuum Series 400, Version 46.0 Built: Mon Aug 11 10:30:58 EDT 1998 (c) Copyright 1995-1998 Stratus Computer, Inc.
Shutting Down the System The following example shows the messages displayed when the system is halted from a multiuser state: # shutdown -h SHUTDOWN PROGRAM 01/27/98 14:43:52 PDT Waiting a grace period of 60 seconds for users to log out. Do not turn off the power or press reset during this time. Broadcast message from root (console) Tue Jan 27 14:44:52 ... SYSTEM BEING BROUGHT DOWN NOW ! ! ! Do you want to continue? (You must respond with ‘y’ or ‘n’.
Shutting Down the System The -r option causes the system to enter single-user state and reboot immediately. CAUTION Do not execute shutdown -r from single-user run-level. If you are in single-user state, you must reboot using the reboot command. For more information, see the reboot(1M) man page. Designating Shutdown Authorization By default, only the super-user can use the shutdown command. You can give other users permission to use shutdown by listing their user names in the /etc/shutdown.allow file.
Dealing with Power Failures Dealing with Power Failures Continuum systems provide power failure protection when connected to an approved UPS through the console controller’s auxiliary port (configured to support a UPS). If an external power failure occurs, the UPS notifies the system of the power failure and switches to battery power. When the system receives the power failure report from the UPS, it waits for the specified grace period. The system continues to function normally during the grace period.
Dealing with Power Failures Configuring the Power Failure Grace Period The power failure grace period is the number of seconds that the system waits after a power failure occurs before it begins an orderly shutdown of the system. If power is restored within the time specified by the grace period, the system does not shut down. The default grace period is 60 seconds.
Managing Flash Cards Configuring the UPS Port You can configure the console controller auxiliary port to support a UPS. See Chapter 3, “Configuring Serial Ports for Terminals and Modems,” in the HP-UX Operating System: Peripherals Configuration (R1001H) for more information. Managing Flash Cards Continuum Series 400/400-CO systems use a device called a flash card to perform the primary boot functions. The flash card contains the primary bootloader, a configuration file, and the secondary bootloader.
Managing Flash Cards You can copy new configuration files and bootloaders to the LIF section using the flifcp and flifrm commands. The size of the files varies depending on your configuration. You can view the size and order of the files using the flifls command. The example in Figure 3-3 lists the LIF files that were used to boot the system.
Managing Flash Cards Table 3-11 describes the flash card utilities. For more information, see the procedures later in this chapter and the corresponding man pages. Table 3-11. Flash Card Utilities Flash Card Utility Description flashboot Copies data from a file on disk to the bootloader area on the flash card. Use this command to copy the bootloader to the flash card. The installation image is stored at /stand/flash/lynx.obj. flashcp Copies data from one flash card to another.
Managing Flash Cards Creating a New Flash Card To initialize a new flash card with the Stratus flash image, copy an installation flash image from the system to the flash card. To do this, use the following procedure: 1. Check that the installation flash image has been installed. To do this, enter swlist | grep Flash-Contents ls /stand/flash/ramdisk0 2. If /stand/flash/ramdisk0 does not exist, do the following: a. Determine the CD-ROM device file name.
4 Mirroring Data 4- This chapter provides information about mirroring data, mirroring root and swap disks, and setting up I/O channel separation. NOTE The Mirror Disk/HP-UX operating system software is included on Continuum systems running the HP-UX operating system; you do not need to purchase it separately. Introduction to Mirroring Data This chapter describes the recommended configuration for mirroring data on Continuum systems.
Introduction to Mirroring Data ■ A physical volume group is a set of physical volumes, or disks, within a volume group. ■ A logical volume is a unit of usable disk space divided into sequential logical extents. Logical volumes can be used for swap, dump, raw data, or file systems. ■ A logical extent is a portion of a logical volume mapped to a physical extent. ■ A physical extent is an addressable unit on a physical volume.
Introduction to Mirroring Data Sample Mirror Configuration Figure 4-1 shows a possible mirror configuration for six disks, three on each logical SCSI bus (that is, “A” disks and “B” disks on separate logical SCSI buses), divided into two physical volume groups. Logical Volume Characteristics 3A 2A 1A Contiguous Noncontagious Double mirror No mirror 3B 2B 1B Physical Volume Groups Volume Group Figure 4-1.
Introduction to Mirroring Data ■ Mirrored logical volumes should use PVG-strict allocation to allocate physical extents. ■ If you use single-initiated SCSI buses, make sure that you mirror disks controlled by a single-initiated SCSI bus with disks controlled by a SCSI bus attached to a controller port of a PCI card in the other card-cage. This strategy will ensure that a logical volume can still be accessed in the event of disk failure or SCSI bus failure.
Mirroring Root and Primary Swap dynamic to choose parallel when the physical write operation is synchronous or sequential when the physical write operation is asynchronous. ■ Mirror Write Cache—Keeps a log of writes that are not yet mirrored, and uses the log at recovery. Performance is slower during regular use to update the log, but recovery time is faster. Use when fast recovery of the data is essential. Turn off for mirrored swap space that is also used as a dump.
Mirroring Root and Primary Swap 4. Add an AUTO file in the boot LIF area. To do this, enter mkboot -a “hpux (14/0/1.0.0;0)/stand/vmunix” /dev/rdsk/address 5. Define the boot volume (typically lvol1), which must be the first logical volume on the physical volume. To do this, enter lvlnboot -b lvol1 /dev/vg00 This takes effect on the next system boot. NOTE The procedure in this section creates a mirror copy of the primary swap logical volume (typically lvol2).
Mirroring Root and Primary Swap 8. Verify that the logical volumes have been created as you intended.
Setting Up I/O Channel Separation Setting Up I/O Channel Separation Stratus recommends that you use I/O channel separation for the physical volumes within a volume group to maintain logical volume mirroring across different SCSI buses. Doing this is important because if a site does not set up I/O separation, the site could perform strict mirroring but still not be fully duplexed, as the mirroring could occur on two different physical volumes but on the same SCSI bus.
Setting Up I/O Channel Separation 4. Extend the volume group to include the second physical volume group, lsb1. To do this, enter vgextend -g lsb1 vgdata /dev/dsk/c1t2d0 /dev/dsk/c1t3d0 This statement adds a second physical volume group called lsb1 to the volume group vgdata. lsb1 contains two disks on logical SCSI bus 1, c1t2d0 and c1t3d0. 5. Create logical volumes with strict physical volume group allocation.
5 Administering Fault Tolerant Hardware 5- This chapter describes the duties related to fault-tolerant hardware administration. It provides information about physical and logical hardware configurations, how to determine component status, and how to manage hardware devices and MTBF statistics. In addition, it provides information about error notification and troubleshooting. Fault Tolerant Hardware Administration Continuum systems are designed for maximum serviceability.
Using Hardware Utilities ■ log changes in the device’s status ■ display the device’s state on demand During normal operation, the system periodically checks each hardware path. If a device is not operating, is missing, or is the wrong model number for that hardware path’s definition, the system logs messages in the system log file and, if configured, sends a message to the console.
Physical Hardware Configuration A hardware path specifies the addresses of the hardware devices leading to a device. It consists of a numerical string of hardware addresses, notated sequentially from the bus address to the device address. You can use the ftsmaint ls command to display the hardware paths of all hardware devices in your system. You can also use the standard ioscan command to display hardware paths.
Physical Hardware Configuration Figure 5-2 shows the hardware path for the console controller bus. Main System Bus 0 RECCBUS 0 Level 2 Subsystems 1 logical devices ... 11~15 1 RECC adpt GBUS RECC adpt Level 1 Bus/Logical 1/0 1/1 Figure 5-2. Console Controller Hardware Path The top level address for a category of logical or physical devices is referred to as a nexus.
Physical Hardware Configuration Table 5-1. Hardware Categories (Continued) Term Description PCI Nexus Refers to the K138 PCI bridge card and its associated resources. (LSM for SCSI ports or LNM for LAN ports is the corresponding logical nexus.) Logical Device Addresses LMERC Nexus Refers to the CPU, memory, and console controller port resources. (PMERC for CPU/memory or RECCBUS for console ports is the corresponding physical nexus.
Physical Hardware Configuration Main System Bus 0/2/7/0 0/2/5/0 0 1 2 SCSI 0 SCSI 0 ... Level 3 7 Subsystem Components SCSI SLOT ... 5 LAN SLOT 3 0/3/7/1 0/3/5/0 0/3/7/2 0/3/7/0 0/2/7/2 0/2/7/1 FLASH 0 0/2/0/0.0 0 BRIDGE SLOT 2 PCMCIA 1 SCSI 0 0 ... 6 0 7 LAN 0/2/3/0 FLASH PCI Bridge (Card-Cage) 7 SCSI 0 T1/E1 0 5 ... SLOT SLOT SLOT 3 ... T1/E1 SLOT PCMCIA 0 3 2 PCI Bridge (Card-Cage) PMERC Level 1 Bus/Logical Level 2 Subsystems logical devices ...
Physical Hardware Configuration CPU, Memory, and Console Controller Paths The CPU and memory constitute one physical nexus (PMERC) while the console controllers constitute a separate physical nexus (RECCBUS), but the resources for both (such as processors or tty devices) are treated as part of the same logical nexus (see “Logical CPU/Memory Configuration”). The CPU, memory, and console controllers are housed in a single suitcase.
Physical Hardware Configuration I/O Subsystem Paths The I/O subsystem addressing convention is as follows: ■ The first-level address, 0, identifies the main system bus nexus (GBUS). ■ The second-level address identifies the I/O subsystem nexus (PCI, HSC, or PKIO). Possible addresses are 2 and 3, which correspond to the two card-cages. ■ The third-level address identifies the SLOT interface, which corresponds to the PCI slot number (0–7).
Logical Hardware Configuration Logical Hardware Configuration The system maps many physical hardware addresses to logical hardware devices.
Logical Hardware Configuration Logical Cabinet Configuration Cabinet components—such as CDC or ACU units, fans, and power supplies—do not have true physical addresses. However, they are treated as pseudo devices and given logical addresses for reporting purposes. The logical cabinet addressing convention is as follows: ■ The first-level address, 12, is the logical cabinet nexus (CAB). ■ The second-level address identifies the specific cabinet number.
Logical Hardware Configuration e25500 e25500 d84000 d84000 d84004 d84004 d84004 d84004 d84004 d84004 p27200 p27200 d84002 d84002 d84002 d84002 p28400 p28400 12/0/0 12/0/1 12/0/2 12/0/3 12/0/4 12/0/5 12/0/6 12/0/7 12/0/8 12/0/9 12/0/10 12/0/11 12/0/12 12/0/13 12/0/14 12/0/15 12/0/16 12/0/17 ACU 0 ACU 1 Disk Tray Disk Tray Tray0 Fan Tray0 Fan Tray0 Fan Tray1 Fan Tray1 Fan Tray1 Fan PCI Power PCI Power Tray0 PSU Tray0 PSU Tray1 PSU Tray1 PSU Rectifier Rectifier 0 1 0 1 2 0 1 2 0 1 0 1 0 1 0 1 CLAIM CLAIM C
Logical Hardware Configuration Logical LAN Manager Configuration The logical LAN manager subsystem addressing convention is as follows: ■ The first-level address, 13, is the logical LAN manager nexus (LNM). ■ The second-level address is a constant, 0. ■ The third-level address identifies a specific adapter (port). Figure 5-5 illustrates a sample configuration for a system with three logical Ethernet (LAN) ports.
Logical Hardware Configuration Logical SCSI Manager Configuration The logical SCSI manager has two primary purposes: to serve as a generalized host bus adapter driver front-end and to implement the concept of a logical SCSI bus. A logical SCSI bus is one that is mapped independently from the actual hardware addresses.
Logical Hardware Configuration Main System Bus 1 0 GBUS RECCBUS 11 12 14 13 CAB LPKIO 15 LMERC LSM LNM 0 4 5 lsm adptr 0 lsm adptr lsm adptr 0 3 SCSI ID 2 CD-ROM lsm adptr 1 SCSI ID 0 lsm adptr lsm adptr transparent 0 transparent 0 transparent 0 transparent 0 transparent 0 transparent 0 ... 14/0/2.0.0 14/0/1.0.0 ... 14/0/3.0.0 14/0/1.3.0 SCSI ID 0 0 disk 0 ... disk 0 SCSI ID 0...15 SCSI ID SCSI ID ...
Logical Hardware Configuration d80200 d85500 14/0/1.3 14/0/1.3.0 SEAGATE ST32550W 14/0/2 LSM Adapter 14/0/2.4 14/0/2.4.0 SONY CD-ROM CDU-7 CLAIM CLAIM CLAIM CLAIM CLAIM - - Online Online Online Online Online - 0 0 0 0 0 Defining a Logical SCSI Bus At boot, the logical SCSI manager creates the logical SCSI buses defined in the CONF file (in the LIF on the flash card or boot disk). The default CONF file provides definitions for the standard logical SCSI buses in a system.
Logical Hardware Configuration lsm3=0/3/7/0:id0=7,tm0=1,tp0=1 lsm4=0/2/3/0,0/3/3/0:id0=15,id1=14,tm0=0,tp0=1,tm1=0,tp1=1 lsm5=0/2/3/1:id0=15,tm0=1,tp0=1 NOTE To maintain fault tolerance across both buses and cards, use one port from a SCSI controller (U501) in each card-cage. Figure 5-7 describes each component of a logical SCSI bus definition.
Logical Hardware Configuration The following guidelines apply to logical SCSI bus definitions: ■ Logical SCSI buses must be named lsm0 to lsm15. ■ Physical hardware paths must be occupied by a SCSI adapter card (for example, U501). The second physical hardware path is the standby device. ■ The adapter card that is used for standby in one logical SCSI bus cannot be used as the primary card in another logical SCSI bus.
Logical Hardware Configuration Table 5-3.
Logical Hardware Configuration Figure 5-4 shows a system with a StorageWorks disk enclosure, dual-initiated SCSI buses (14/0/0 and 14/0/1), 16 disk drives on those buses (the disks are labeled 0/0 through 1/7; the first number specifies the SCSI bus [0 or 1] and the second number specifies the SCSI ID [0 through 7]), and the single-initiated SCSI buses (14/0/2 and 14/0/3).
Logical Hardware Configuration Figure 5-5 shows a system with a Eurologic disk enclosure, dual-initiated SCSI buses (14/0/0 and 14/0/1), 14 disk drives and four PSUs on those buses, and the single-initiated SCSI buses (14/0/2 and 14/0/3). Main System Bus Card-Cage 2 U501Card Card-Cage 3 U501 Card SCSI SCSI SCSI SCSI SCSI SCSI 0 1 2 2 1 0 14/0/0 7 6 5 4 3 2 1 14/0/3 PSU PSU 0/14 0 0 0 0 0 0 14/0/2 PSU PSU 0/14 0 0 0 0 0 0 Slot 14/0/1 Figure 5-5.
Logical Hardware Configuration ■ For flash cards, type is rflash, x is the instance number of the flash card (either 2 or 3), and y and z are always zero (0). Flash cards also use the form c#a#d# instead of c#t#d#. Note flash cards are not SCSI devices and use physical, not logical, hardware paths. Table 5-6 shows the device file names and corresponding hardware paths for sample disk, CD-ROM, tape, and flash card devices. Table 5-6.
Determining Component Status Figure 5-8 illustrates the logical CPU/memory configuration. CAB 12 1 15/0/0 15/0/1 LMERC 15 transparent 2 0 15/1/0 14 LSM transparent 1 Memory 0 Processor Processor transparent 0 13 LNM 0 15/2/0 1 2 tty2 11 tty1 CDIO console Main System Bus GBUS 0 RECCBUS 1 15/2/1 15/2/2 Figure 5-8.
Determining Component Status Software State The system creates a node for each hardware device that is either installed or listed in the /stand/ioconfig file. A device can be in one of the software states shown in Table 5-7. Table 5-7. Software States State Description UNCLAIMED Initialization state, or hardware exists, and no software is associated with the node. CLAIMED The driver recognizes the device. ERROR The device is recognized, but it is in an error state.
Determining Component Status A device is initially created in the UNCLAIMED state when it is detected at boot time or when information about the device is found in the /stand/ioconfig file. The following state transitions can occur: ■ UNCLAIMED to CLAIMED – A driver recognizes the device and claims it. ■ CLAIMED to CLAIMED – A driver reports a soft error on the device and the soft error weight or threshold values are still acceptable.
Determining Component Status Hardware Status In addition to a software state, each hardware device has a particular hardware status. The status values are as shown in Table 5-8. Table 5-8. Hardware Status Status Meaning Online The device is actively working. Online Standby The device is not logically active, but it is operational. The ftsmaint switch or ftsmaint sync command can be used to change the device status to Online.
Managing Hardware Devices Fault Code MTBF MTBF Threshold Weight. Soft Errors Min. Number Samples : : : : : Infinity 1440 Seconds 1 6 Managing Hardware Devices The system adds CRUs and FRUs to the system at boot time by scanning the existing hardware devices and configuring the system accordingly. When the system is running, you can use ftsmaint commands to enable or disable hardware devices. When removing a CRU, you must replace it with another device of the same type.
Managing Hardware Devices Hardware paths are in the H/W Path column. 2. Set the component into blink mode. To do this, enter ftsmaint blinkstart hw_path hw_path is the hardware path determined in step 1. This causes the component’s status lights to begin blinking, which verifies that the status lights are operational. For example, the following commands blink the status lights in suitcase 0, slot 0 in card-cage 3, and all occupied slots in card-cage 3, respectively.
Managing Hardware Devices Disabling a Hardware Device The system administrator can manually take a device out of service and place it in the ERROR state. To do this, enter ftsmaint disable hw_path hw_path is the hardware path of the device you want to disable. CAUTION Disabling a device might cause unexpected problems. Contact the CAC before disabling a device.
Managing MTBF Statistics hw_path is the hardware path of the device. If the device does not change to CLAIMED, call the CAC for further assistance. For more information about contacting the CAC, see the Preface of this manual. Managing MTBF Statistics The system maintains statistics on the mean time between failures (MTBF) for each hardware device in the system.
Managing MTBF Statistics – If the MTBF is less than the threshold, the system takes the device out of service and places it in the ERROR state. – If the MTBF is greater than the threshold, the system takes no further action and continues to monitor the device for errors. Displaying MTBF Information You can use the ftsmaint ls hw_path command to display the current MTBF information for a device.
Managing MTBF Statistics NOTE Clearing the MTBF does not bring the device back into service automatically. If the device that you cleared is in the ERROR state, you must correct the state using the ftsmaint reset and enable commands. (See “Correcting the Error State” for more information.) Changing the MTBF Threshold The MTBF threshold is expressed in seconds. If a device’s MTBF falls beneath this threshold, the system takes the device out of service and changes the device state to ERROR.
Error Notification ■ If you set min_samples to 0, the system does not calculate MTBF, but considers the device to have exceeded the MTBF threshold at the first failure. ■ If you set min_samples to a value greater than 6, the system sets it to 6. To clear all the error information recorded for a device, enter ftsmaint clear hw_path hw_path is the hardware path of the device. NOTE The default numsamp value for suitcases is either 0 (for PA 7100-based suitcases) or 6 (for PA 8000-based suitcases).
Error Notification Remote Service Network The Remote Service Network (RSN) software running on your system collects hardware faults and significant events. The RSN allows trained Customer Assistance Center (CAC) personnel to analyze and correct problems remotely. For information about configuring the RSN, see Chapter 6, “Remote Service Network.” Status Lights Status lights are provided for almost all devices.
Monitoring and Troubleshooting Console and syslog Messages Each time a significant event occurs, the syslog message logging facility enters an error message into the system log, /var/adm/syslog/syslog.log. Depending upon the severity of the error and the phase of system operation, the same message might also be configured to display on the console. For more information, see the syslog(3C) and syslogd(1M) man pages.
Monitoring and Troubleshooting – pwck and grpck for password inconsistencies information – who and whodo for current user information – netstat, uustat, lanscan, ping, and ifconfig for network services information – ypcat, ypmatch, ypwhich, and yppoll for Network Information Service (NIS) information – df and du for disk and volume information Modifying System Resources After you analyze the system status, you can use various tools to manipulate your system.
Monitoring and Troubleshooting Fault Codes The fault tolerant services return fault codes when certain events occur. The ftsmaint ls command displays fault codes in the FCode (short format) or Fault Code (long format) field. Table 5-9 lists and describes the fault codes. Table 5-9. Fault Codes Short Format Long Format Explanation 2FLT Both ACUs Faulted Both ACUs are faulted. ADROK Cabinet Address Frozen The cabinet address is frozen.
Monitoring and Troubleshooting Table 5-9. Fault Codes (Continued) Short Format Long Format Explanation CABCFG Cabinet Configuration Incorrect The cabinet contains an illegal configuration. CABDCD Cabinet DC Distribution Unit Fault A DC distribution unit faulted. CABFAN Broken Cabinet Fan A cabinet fan failed. CABFLT Cabinet Fault Detected A component in the cabinet faulted. CABFLT Cabinet Fault Light On The cabinet fault light is on.
Monitoring and Troubleshooting Table 5-9. Fault Codes (Continued) Short Format Long Format Explanation CHARGE Charging Battery A battery CRU/FRU is charging. To leave this state, the battery needs to be permanently bad or fully charged. DSKFAN Disk Fan Faulted/Missing The disk fan either faulted or is missing. ENC OK SCSI Peripheral Enclosure OK The SCSI peripheral enclosure is OK. ENCFLT SCSI Peripheral Enclosure Fault A device in the tape/disk enclosure faulted.
Monitoring and Troubleshooting Table 5-9. Fault Codes (Continued) Short Format Long Format Explanation IPSFlt IOA Chassis Power Supply Fault An I/O Adapter power supply fault was detected. IS In Service The CRU/FRU is in service. LITEOK Cabinet Fault Light OK The cabinet fault light is OK. MISSNG Missing replaceable unit The ACU is missing, electrically undetectable, removed, or deleted. MTBF Below MTBF Threshold The CRU/FRU’s rate of transient and hard failures became too great.
Monitoring and Troubleshooting Table 5-9. Fault Codes (Continued) Short Format Long Format Explanation PWR Breaker Tripped The circuit breaker for the PCIB power supply tripped. REGDIF ACU Registers Differ A comparison of the registers on both ACUs showed a difference. SOFT Soft Error The driver reported a transient error. A transient error occurs when a hardware fault is detected, but the problem is corrected by the system. Look at the syslog for related error messages.
Saving Memory Dumps Saving Memory Dumps The dump process provides a method of capturing a “snapshot” of what your system was doing at the time of a panic. When the system panics, it tries to save the image of physical memory, or certain portions of it. The system automatically dumps memory when a panic occurs. You can also save a dump manually in the event of a system hang. A system dump occurs when the kernel encounters a significant error that causes a system panic. If the kernel panics, a dump occurs.
Saving Memory Dumps thus, enhances system availability. (Selective save_mcore also supports a 64 bit kernel and dumps on systems with a greater than 4 GB memory size.) By default, save_mcore will attempt to save a dump to the file system you have specified in the file /etc/rc.config.d/savecrash, except in the following instances: ■ – You changed the save_mcore_dumps_only=1 (the default) parameter in the conf file to save_mcore_dumps_only=0.
Saving Memory Dumps Table 5-10. Dump Configuration Decisions Consideration Dump Level: Full Dump, Selective Dump, or No Dump compressed save vs.
Saving Memory Dumps Dump Space Needed for Full System Dumps The amount of dump space you need to define is based on the size of the system’s physical memory. NOTE During the startup sequence, save_mcore is invoked automatically. If sufficient space is not available in /var/adm/crash to hold a file equal to the size of physical memory, dumping will fail, leaving the system simplexed. At this time, you can run save_mcore manually and then use the ftsmaint sync command to duplex the system.
Saving Memory Dumps Multiply the number of pages listed in Total pages included in dump by the page size (4 KB), and add 25% for a margin of safety to give you an estimate of how much dump space to provide. For example, (6208 x 4KB) x 1.25 = approximately 30MB of space needed. Configuring save_mcore You can configure save_mcore through the /etc/rc.config.d/savecrash file. Both dump utilities, save_mcore and savecrash, share the configuration file /etc/rc.config.d/savecrash.
Table 5-11. save_mcore Options and Parameter Option Description -v Enables additional progress messages and diagnostics. -n Skip saving kernel modules. -z Compress all physical memory image files and kernel module files in the dump directory. -Z Do not compress any files in the dump directory. -f Generate a byte-for-byte full dump. All of memory is written to one output file. In this mode, dirname/crash.n is the actual output file instead of a directory.
Saving Memory Dumps Configuring a Dump Device for savecrash You can configure a dump device into the kernel through the SAM interface or through HP-UX commands. You can also modify run-time dump device definitions though the fstab file and the crashconf utility. For more information, refer to Managing Systems and Workgroups (B2355-90157).
Saving Memory Dumps NOTE The order of the devices in the list is important. Directories are used in reverse order from the way they appear in the list. The last device in the list is used as the first dump device. 4. Follow the SAM procedure for building a new kernel. 5. When the time is appropriate, boot your system from the new kernel file to activate your new dump device definitions.
Saving Memory Dumps – If you want to configure the kernel without any dump devices, use the following dump statement in the system file: dump none NOTE If you omit any dump statements from the system file, the kernel will use the primary paging device (swap device) as the dump device. 2. After editing the system file, build a new kernel file using the config command. 3.
Saving Memory Dumps Using crashconf to Specify a Dump Device You can use crashconf to directly specify the devices to be configured. Table 5-12 describes how to use the crashconf command to add to, remove, or redefine dump devices. Table 5-12.
Saving Memory Dumps Saving a Dump After a System Hang Using save_mcore from an offline CPU, you can create a core dump of the operating system after a system hang. The Continuum system can be configured to reboot in simplexed state after a system crash or hang (that is, one CPU/memory module is kept offline, with its memory contents intact). You can then obtain the dump from the offline module. If the dump is successfully retrieved, the system will be reduplexed.
Saving Memory Dumps Preventing the Loss of a Dump To prevent losing a dump after system interruption, if you configured the system to use savecrash as the default dump utility, or if a crash occurs when the system is in simplex mode, you need to do the following: ■ configure the primary and secondary swap partitions with the Mirror Write Cache option disabled and Mirror Consistency Recovery option disabled.
6 Remote Service Network 6- The Remote Service Network (RSN) is a highly secure worldwide network that Stratus uses to monitor its customer’s fault tolerant systems. Your system contains RSN software that regularly polls your system for the status of the hardware. If the RSN software detects a fault or system event, it automatically sends a message to a Stratus HUB system. The HUB system is usually located at the Customer Assistance Center (CAC) nearest to your site.
How the RSN Software Works System (R1021H) for information on the Site Call System, the recommended RSN interface. ■ remote access to your system by CAC personnel (dial-in)—A Continuum system provides two special logins that the CAC can use to dial in to your system to diagnose problems and perform data transfer functions. The logins, sracs and sracsx, are subject to validation by the system administrator at your site. You use the validate_hub command to validate an incoming call.
How the RSN Software Works Your System Received Files 4 mntreq 5 8 RSN Queue rsnadmin File File Call Mail Call Mail 9 2 rsndb rsntrans 1 6 7 rsnd CAC login rsngetty 3 Async Modem To Stratus HUB Figure 6-1. RSN Software Components HP-UX version 11.00.
Using the RSN Software Using the RSN Software This section describes various tasks that you can perform using the RSN software. NOTE RSN commands are located in /usr/stratus/rsn/bin. Configuring the RSN You must install and initialize the RSN modem and configure the RSN software before you can perform the tasks described in this section.
Using the RSN Software Starting the RSN Software You can activate RSN communications using the rsnon command. The rsnon command interactively prompts you to set rsndbs, rsngetty, and rsn_monitor to respawn in /etc/inittab and uncomments the rsntrans line in the/var/spool/cron/crontabs/sracs file. The following is a sample rsnon session: # rsnon ****************************************************************** ****************************************************************** 1.
Using the RSN Software Checking Your RSN Setup You can use the rsncheck command to display the configuration of your RSN software and flags any errors.
Using the RSN Software Stopping the RSN Software When you are building a new system or making significant changes to an existing system, you might want to “turn off” the RSN software. To stop the RSN communication daemons rsngetty and rsndbs, use the rsnoff command. The rsnoff command sets rsngetty and rsndbs to off in /etc/inittab and disables rsntrans in /var/spool/cron/crontabs/sracs. The following is a sample rsnoff session. The -a option stops the rsn_monitor and rsnd daemons. # rsnoff -a 1.
Using the RSN Software Sending Mail to the HUB The mntreq command is an interactive utility that lets you communicate with the supporting Stratus HUB. mntreq provides three subcommands, addcall, updatecall, and mail. For information about using the addcall and updatecall subcommands, see the mntreq(1M) man page. NOTE To use the mntreq command, the directory /var/stratus/rsn/queues/mntreq.d must exist. If it does not, an error message will appear when you try to use mntreq.
Using the RSN Software Validating Incoming Calls To verify that an incoming telephone call to your site originates from the HUB, you can request that the caller supply the code for your site. You use the validate_hub command to determine the unique three-digit code for your site on a particular date. The following shows sample output of the validate_hub command: # validate_hub Site_id is smith_co Validation code on 97-11-19 is 642 For more information, see the validate_hub(1M) man page.
Using the RSN Software Cancelling an RSN Request To cancel a queued RSN request, use the cancel_rsn_req command. You can cancel a specific job or all pending jobs. Non-super-users can cancel their own jobs; the super-user can cancel other user’s jobs as well. The following example cancels a specific job. You can get the job number using list_rsn_req, as shown in the previous section.
RSN Command Summary RSN Command Summary Table 6-1 lists all the commands you can use to manage RSN. All of these commands are in the /usr/stratus/rsn/bin directory. See the corresponding man pages for additional information. Table 6-1. RSN Commands Command Function cancel_rsn_req Cancels an RSN request. list_rsn_cfg Lists RSN configuration information. list_rsn_req Selectively lists all RSN jobs queued to be sent to the HUB.
RSN Files and Directories RSN Files and Directories The following sections provide information on files and directories necessary to configure the RSN software. Output and Status Files The /etc/stratus/rsn directory contains various output and status files. Table 6-2 describes the files located in the /etc/stratus/rsn directory. Table 6-2.
RSN Files and Directories Communication Queues The /var/stratus/rsn/queues directory contains files and subdirectories used by RSNCP when it communicates with the HUB. These files include TM files, LCK files, C. files, D. files and Z. files. Table 6-3 describes the files and subdirectories located in the /var/stratus/rsn/queues directory. Table 6-3. Contents of /var/stratus/rsn/queues File/SubDirectory Subdirectory Files Description core* Not applicable Core files (if any) from the rsnd daemon.
RSN Files and Directories Table 6-3. Contents of /var/stratus/rsn/queues (Continued) File/SubDirectory Subdirectory Files Description logs/ rsnlog.date Contains a log of all file transfer activity between the HUB and the site. comm.date This file logs all low-level RSN modem activity. rsngetty.out Contains a log of all rsngetty activity. rsngetty monitors the /dev/ttyd2p0 port.
RSN Files and Directories Other RSN-Related Files In addition to the files described earlier, the RSN software also uses certain RSN-related files in other locations. Table 6-4 lists the path names and RSN-related functions of those files. Table 6-4. RSN-Related Files in Other Locations Path Name Description /var/spool/cron/crontabs/sracs This file contains entries for rsntrans and rsncleanup to service any pending RSN work periodically and to clean up any log files, respectively.
7 Remote STREAMS Environment 7- In HP-UX version 11.00.03, Remote STREAMS Environment (RSE) is provided as part of the kernel and the software package is named ORSE. The following sections describe the Remote STREAMS Environment (RSE). This section describes RSE.
Remote STREAMS Environment STREAMS driver instance and a remote communications adapter STREAMS instance from the file /etc/orse/orsdinfo. NOTE Prior to running an RSE application, first ensure that information in the orsdinfo file is current, then run the otelrsd utility. Figure 7-1 illustrates a configuration with four remote Streams.
Remote STREAMS Environment Configuring the Host Configuring the host for HP-UX version 11.00.03 includes the following tasks: ■ Creating or customizing the /etc/orse/orsdinfo file to reflect your system configuration ■ Updating the ORSD configuration ■ Defining the HP-UX version 11.00.03 firmware and the physical hardware path to the adapter cards in the /etc/lucent/opersonality.conf file ■ Killing and restarting the daemons NOTE The opersonality.conf works together with the odownload.
Remote STREAMS Environment The following is the template of the orsdinfo file that is installed with the operating system: # The file format is : # # # Flag - Currently 0 or 1. 1 => CLONEOPEN. # DrvName - is the name of the Driver in the firmware we want to open. # PCIMin - The firmware driver minor number # # Optional Data # # SM_ERR - If a 1 is entered here, a M_ERROR will # be sentupstream when ERRORS occur.
Remote STREAMS Environment Updating the RSD Configuration The otelrsd utility reads remote STREAMS driver (RSD) information from the orsdinfo file, creates any needed device nodes, and updates the RSD configuration. It makes two passes when reading the orsdinfo file: ■ The first pass checks both the format of the orsdinfo input file and the value of each field. otelrsd prints an error message and exits immediately if an error is found during the first pass.
Remote STREAMS Environment Customizing the orsdinfo File RSE passes data from the kernel to the communications adapters. The /etc/orse/orsdinfo file defines the mapping between instances of the HP-UX operating system device and instances of a remote communications adapter STREAMS device. To configure RSE for your system, customize the orsdinfo file to reflect your system configuration. After editing orsdinfo, run the otelrsd command to activate the changes.
Remote STREAMS Environment NOTE You can not change a personality; however, new entries can be added. After adding an entry, run the ftsftnprop command then the odownloadd -rescan command to set the new parameters. Downloading Firmware This section describes the following procedures: ■ Downloading New Firmware ■ Adding or Moving Cards Downloading New Firmware To download new firmware to the communications adapter, either reboot your system or issue the following commands: 1.
Remote STREAMS Environment Downloading Firmware to a Card The /sbin/orsericload utility is a top-level wrapper script for downloading configuration files. It is called by odownloadd with all the arguments taken from the opersonality.config and odownload.conf files. After the orsericload utility has finished downloading the files, it calls tfinal_init. The syntax for the orsericload command is as follows: orsericload [-r] [-p card_#][-c config] [-x tcxbinfo] Table 7-2.
Remote STREAMS Environment Adding or Moving a Card When a new card is added or moved on the system, install the HP-UX version 11.00.03 driver on the card using the following procedure: 1.
Remote STREAMS Environment Using /etc/lucent/ocardinfo.template1 params file + [ 14 -ne 0 -a /etc/lucent/orseconfg1 != 0 ] + /sbin/tomcat/cxbparams -v -f /etc/lucent/ocardinfo.template1 -s 14 Begin processing cxbinfo file.... End of processing cxbinfo file.... + [ 0 -ne 0 ] + grep -v ^# /etc/lucent/orseconfg1 Begin processing cxbinfo file.... End of processing cxbinfo file...
Remote STREAMS Environment Process ID = 0x05010003 [Info] Loading card 12 /sbin/tomcat/rpq_gdb.rel -O -D3 ... /etc/lucent/rpq_ll.card14.out created successfully /etc/lucent/rpq_ll.rel successfully loaded on card 14 Process ID = 0x05010003 [Info] Loading card 14 /sbin/tomcat/rpq_gdb.rel -O -D3 ... /sbin/tomcat/rpq_gdb.card14.out created successfully /sbin/tomcat/rpq_gdb.rel successfully loaded on card 14 Process Name = rpq_gdb.rel Process ID = 0x05010004 [Info] Loading card 14 /sbin/tomcat/rpq_wdog.
Remote STREAMS Environment Read Krib, PMCB = 0x287b0 Proc Table Starts at 0x2d000 Proc table for rpq_wdog.rel starts at 0x2d334 Code Base for rpq_wdog.
A Stratus Value-Added Features A- This appendix discusses the following Stratus value-added features: ■ new and customized software ■ new and customized commands New and Customized Software This appendix describes the commands and features of the HP-UX operating system that are either unique to Stratus or modified from the base release to support Continuum systems. NOTE The HP-UX version 11.00.03 operating system runs as a 64-bit operating system. In general, the HP-UX version 11.00.
Stratus Value-Added Features Console Interface Continuum systems provide a system console interface through which you can execute machine management commands. A set of console commands allows you to quickly control important machine actions. To access the console command interface, you must connect a terminal to the console controller.
Stratus Value-Added Features Mean-Time-Between-Failures Administration Continuum systems automatically maintain MTBF statistics for many system components. You can access the information at any time and can reconfigure MTBF parameters, which affects how the fault tolerant services (FTS) software subsystem responds to component problems. For information about configuring MTBF thresholds and managing fault tolerance, see “Managing MTBF Statistics” in Chapter 5, “Administering Fault Tolerant Hardware.
New and Customized Commands after installation, as well as Stratus’s recommendations for disk mirroring, see Chapter 4, “Mirroring Data.” For information about mirroring the root disk during installation, see the HP-UX Operating System: Installation and Update (R1002H). For general information about disk mirroring on an HP-UX operating system, see the Managing Systems and Workgroups (B2355-90157).
B Updating PROM Code B- This appendix describes how to update the different PROM codes and download I/O firmware. Updating PROM Code All new or replacement boards come with the latest PROM code already installed. However, occasionally circumstances might require that you update the PROM on new hardware. In addition, Stratus releases revisions to PROM code periodically that must be copied to (or burned on) your existing boards.
Updating PROM Code Table B-1. PROM Code File Naming Conventions PROM Code File Type Naming Convention CPU/memory GNMNSccVV.V.xxx GNMM or GNMN is the modelx number, G2X2 for PA-8500 and PA-8600. S is the submodel compatibility number (0–9). cc is the source code identifier: fw is firmware. VV is the major revision number (0–99). V is the minor revision number (0–9). xxx is the file type (raw or bin). For example: G2X20fw7.0.bin console controller EMMMMSccVV.Vrom.
Updating CPU/Memory PROM Code Updating CPU/Memory PROM Code If a Stratus representative instructs you to update PROM code on duplexed CPU/memory boards inside a CPU board, use the following procedure to do so. Verify with the representative that you have selected the correct PROM code file to burn before starting this procedure. CAUTION If your boards are not duplexed, you will disrupt access to the system. Contact the CAC for assistance. 1.
Updating CPU/Memory PROM Code For more information about PROM-code file naming conventions, see Table B-1. 4. When the prompt returns, switch the status of both CPU boards (that is, activate the standby CPU board and put the active CPU board on standby) by entering ftsmaint switch hw_path hw_path is the path of the CPU board to be brought online.
Updating Console Controller PROM Code 8. Periodically check the status of the CPU board being duplexed (see step 5). The update is complete when both CPU boards have a status of Online Duplexed and both show a single green light. 9.
Updating Console Controller PROM Code 2. Update the PROM code on the standby console controller for the online partition by entering ftsmaint burnprom -F online -f prom_code hw_path partition is the partition to be burned, prom_code is the path name of the PROM code file, and hw_path is the path name of the standby console controller. For example, to burn the online partition, you would enter the command ftsmaint burnprom -F online -f E5940on21.
Updating U501 SCSI Adapter Card PROM Code Do not proceed until the status of both console controllers is correct. (During the transition, a console controller is listed as offline; do not proceed until it is listed as Online Standby.) 6. Update the PROM code on the console controller that is now on standby (that is, repeat step 2 and step 3 for the other console controller). Once these commands are complete, both console controllers will be updated with the same PROM code. 7.
Updating U501 SCSI Adapter Card PROM Code 2. Notify users of any external devices or single-initiated logical SCSI buses attached to both SCSI adapter cards that service will be disrupted. Disconnect the cables from both ports. 3. Determine which (if any) of the cards you plan to update contain resources (ports) on standby duplexed status by entering ftsmaint ls hw_path | grep -e Status -e Partner hw_path is the hardware path determined in step 1.
Updating U501 SCSI Adapter Card PROM Code 7. Restart duplexing between the standby resource and its partner by entering ftsmaint sync hw_path hw_path is the hardware path of the standby resource. For example, to restart duplexing for 0/3/7/1, you would enter the command ftsmaint sync 0/3/7/1 NOTE Invoking ftsmaint sync on a single resource also restarts (as appropriate) duplexing for other resources (ports) on that card. Therefore, it is not necessary to repeat this command for the other resources. 8.
Downloading I/O Card Firmware 11. Check the status of the newly updated card and verify the current (updated) PROM code version by entering the following command for both the resource and its partner ftsmaint ls hw_path When the status becomes Online Standby Duplexed, the card has resumed duplex mode. Downloading I/O Card Firmware When the operating system boots or an I/O card is added, Continuum systems can automatically download firmware into the card(s) as necessary.
Index A activating a new kernel, 3-27 addhardware command, 5-2 adding dump devices, 5-50 addressing logical hardware paths, 5-9 physical hardware paths, 5-3 administrative tasks finding information about, 1-2 standard command paths, 1-1 alternate kernel, booting, 3-13 architecture fault tolerant hardware, 1-6 fault tolerant software, 1-7 autoboot, 3-4 autoboot, enabling and disabling, 3-4 B backups cross-reference, 1-3 bad block relocation, 4-4 bay see card-cage boot methods, 3-16 boot parameters specifyi
Index console controller, 1-5 burning PROM code on, B-5 features of, 1-5 offline partition, B-6 online partition, B-6 path partition, 3-5 console messages, 5-34 contiguous allocation, 4-4 contiguous extents, 4-2 continuous availability architecture, 1-4 software, 1-7 Continuum Series 400 physical components, 1-4 control panel, 1-5 conventions, notation, xiii core dump see dump CRU, 1-6, 5-1 Customer Assistance Center see CAC Customer Service login, 6-2 customer-replaceable unit (CRU), 1-4 customer-replaceab
Index FRU, 1-6, 5-1 ftsmaint command burning PROM code console controller, B-5 CPU/memory board, B-3 online, offline, diag partitions, B-5 path partition, 3-5 SCSI adapter card, B-7 changing MTBF fault limit, 5-31 determining hardware paths with, 5-3 displaying MTBF statistics, 5-30 enabling hardware, 5-28 grace period, power failure, 3-30 guidelines for maintaining system, 2-5 I/O subsystem addresses adapter or bridge, 5-8 device-specific service, 5-8 I/O subsystem nexus (PCI, HSC, or PKIO), 5-8 main sys
Index logical CPU/memory addresses individual resources, 5-21 logical CPU/memory nexus (LMERC), 5-21 resource type, 5-21 logical devices, 5-4 logical extent, 4-2 logical hardware addressing, 5-9 logical hardware categories logical cabinet, 5-9 logical communications I/O, 5-9 logical CPU/memory, 5-9 logical LAN manager (LNM), 5-9 logical SCSI manager (LSM), 5-9 Logical Interchange Format (LIF) volume, 3-6 logical LAN manager addresses, 5-12 logical LAN manager nexus (LNM), 5-12 specific adapter (port), 5-12
Index online partition, B-6 outgoing RSN files hub_pickup directory, 6-13 mail, 6-14 P pair and spare architecture, 1-6 parallel scheduling, disk mirroring, 4-4 path names, administrative commands, 1-1 path partition, 3-5 PCI bay see card-cage PCI bridge card (K138), 5-5 PCMCIA, 3-31 peripheral component interconnect (PCI), 1-4 permissions shutdown, 3-28 physical addresses console controller (RECC), 5-7 console controller bus nexus (RECCBUS), 5-7 CPU/memory nexus (PMERC), 5-7 main system bus nexus (GBUS),
Index rsndbs command, 6-4 rsngetty command, 6-2, 6-4 rsnoff command, 6-7, 6-11 rsnon command, 6-4, 6-5, 6-11 rsnport, 6-11 rsntrans command, 6-2, 6-4 rsntry command, 6-9, 6-11 run-level single-user mode, 3-9 S SAM disk mirroring options, 4-4 /sbin/ftsftnprop, 7-8 scheduling, disk mirroring, 4-4 SCSI adapter card, updating PROM, B-7 SCSI devices, 5-18 SCSI I/O controller (U501), 5-5 secondary bootloader, 3-4 self-checking diagnostics, 1-6 separation, I/O channel, 4-8 sequential scheduling, disk mirroring, 4