SPARC Enterprise T5140 and T5240 Servers Administration Guide TM Manual Code C120-E498-03EN Part No.
Copyright © 2009 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved. FUJITSU LIMITED provided technical input and review on portions of this material. Sun Microsystems, Inc. and Fujitsu Limited each own or control intellectual property rights relating to products and technology described in this document, and such products, technology and this document are protected by copyright laws, patents and other intellectual property laws and international treaties.
Copyright © 2009 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, Etats-Unis. Tous droits réservés. Entrée et revue tecnical fournies par FUJITSU LIMITED sur des parties de ce matériel. Sun Microsystems, Inc. et Fujitsu Limited détiennent et contrôlent toutes deux des droits de propriété intellectuelle relatifs aux produits et technologies décrits dans ce document.
Contents Preface ix Communicating With the System ILOM Overview 1 1 ▼ Log In to ILOM ▼ Log In to the System Console ▼ Display the ok Prompt ▼ Display the ILOM ->Prompt ▼ Use a Local Graphics Monitor 2 Performing Common Tasks 3 4 5 7 ▼ Power On the System 7 ▼ Power Off the System 8 ▼ Reset the System ▼ Update the Firmware Managing Disks 3 8 9 13 Hardware RAID Support 13 Creating Hardware RAID Volumes 14 ▼ Create a Hardware Mirrored Volume ▼ Create a Hardware Mirrored
▼ Delete a Hardware RAID Volume ▼ Hot-Plug a Mirrored Disk ▼ Hot-Plug a Nonmirrored Disk Disk Slot Numbers Managing Devices 25 28 30 35 37 ▼ Unconfigure a Device Manually 37 ▼ Reconfigure a Device Manually 38 Devices and Device Identifiers 38 SPARC Enterprise T5x40 Device Tree Multipathing Software Handling Faults 40 43 Discovering Faults 43 ▼ Discover Faults Using ILOM 44 ▼ Discover Faults Using POST 44 ▼ Locate the System Bypassing Minor Faults 45 46 Automatic System Rec
Index 57 Contents vii
viii SPARC Enterprise T5140 and T5240 Servers Administration Guide • July 2009
Preface This manual is for experienced system administrators. This manual includes general descriptive information about the SPARC Enterprise™ T5140 and T5240 servers, and detailed instructions for configuring and administering the servers. To use the information in this document, you must have working knowledge of computer network concepts and terms, and advanced familiarity with the Solaris™ Operating System (Solaris OS).
Structure and Contents of This Manual This manual is organized as described below: ■ Communicating With the System Describes basic procedures to communicate with the system. ■ Performing Common Tasks Describes basic procedures for common tasks such as powering on and off the system. ■ Managing Disks Describes how to configure and manage RAID disk volumes using the SPARC Enterprise T5140 and T5420 server’s on-board serial attached SCSI (SAS) disk controller, and how to hot-plug a disk.
Related Documentation The latest versions of all the SPARC Enterprise Series manuals are available at the following Web sites: Global Site (http://www.fujitsu.com/sparcenterprise/manual/) Japanese Site (http://primeserver.fujitsu.
x Title Description Manual Code SPARC Enterprise T5140 and T5240 Servers Service Manual How to run diagnostics to troubleshoot the server, and how to remove and replace parts in the server C120-E497 SPARC Enterprise T5140 and T5240 Servers Administration Guide How to perform administrative tasks that are specific to the servers C120-E498 Integrated Lights Out Manager 2.0 User’s Guide Information that is common to all platforms managed by Integrated Lights Out Manager (ILOM) 2.
Note – Product Notes are available on the website only. Please check for the recent update on your product. UNIX Commands This document might not contain information on basic UNIX® commands and procedures such as shutting down the system, booting the system, and configuring devices. Refer to the following for this information: ■ Software documentation that you received with your system ■ Solaris™ Operating System documentation, which is at (http://docs.sun.
Prompt Notations The following prompt notations are used in this manual.
Communicating With the System This section includes information on low-level communication with the system, using the Integrated Lights Out Manager (ILOM) tool and the system console.
Related Information ■ “Log In to ILOM” on page 2 ■ Integrated Lights Out Manager (ILOM) 2.0 Documentation ■ Integrated Lights Out Manager (ILOM) 2.0 Supplement for SPARC Enterprise T5140 and T5240 Servers ■ Integrated Lights Out Manager (ILOM) 3.0 Documentation ■ Integrated Lights Out Manager (ILOM) 3.0 Supplement for SPARC Enterprise T5140 and T5240 Servers ▼ Log In to ILOM This procedure assumes the default configuration of the service processor as described in your server’s installation guide.
Related Information ■ “ILOM Overview” on page 1 ■ “Log In to the System Console” on page 3 ▼ Log In to the System Console 1. “Log In to ILOM” on page 2. 2. To access the system console from ILOM, type: -> start /SP/console Are you sure you want to start /SP/console (y/n) ? y Serial console started. To stop, type #. . . . You are logged in to the system console. Perform tasks as needed. Note – If the Solaris OS is not running, the system displays the ok prompt.
Caution – When possible, reach the ok prompt by performing a graceful shutdown of the OS. Any other method used might result in the loss of system state data. System State What To Do OS Running and Responsive Shut down the system using one of these methods: • From a shell or command tool window, issue an appropriate command (for example, the shutdown or init 0 command) as described in Solaris system administration documentation.
■ Log in to ILOM through an SSH connection. See “Log In to ILOM” on page 2. Related Information ■ “ILOM Overview” on page 1 ▼ Use a Local Graphics Monitor Though it is not recommended, the system console can be redirected to the graphics frame buffer. You cannot use a local graphics monitor to perform initial system installation, nor can you use a local graphics monitor to view power-on self-test (POST) messages.
Note – There are many other system configuration variables. Although these variables do not affect which hardware device is used to access the system console, some of the variables affect which diagnostic tests the system runs and which messages the system displays at its console. For details, refer to the service manual for your server. 8.
Performing Common Tasks This section includes procedures for some common tasks performed on the servers. ■ “Power On the System” on page 7 ■ “Power Off the System” on page 8 ■ “Reset the System” on page 8 ■ “Update the Firmware” on page 9 ▼ Power On the System 1. “Log In to ILOM” on page 2 2. At the ILOM -> prompt, type: -> start /SYS Are you sure you want to start /SYS (y/n) ? y Starting /SYS -> Note – To force a power-on sequence, use the start -script /SYS command.
▼ Power Off the System 1. Shut down the Solaris OS. At the Solaris prompt, type: # shutdown -g0 -i0 -y # svc.startd: The system is coming down. Please wait. svc.startd: 91 system services are now being stopped. Jun 12 19:46:57 wgs41-58 syslogd: going down on signal 15 svc.stard: The system is down. syncing file systems...done Program terminated r)eboot o)k prompt, h)alt? 2. Switch from the system console prompt to the service processor console prompt. Type: ok #. -> 3.
● To reset the system, from the Solaris prompt, type: # shutdown -g0 -i6 -y Related Information ■ “Power Off the System” on page 8 ■ “Power On the System” on page 7 ▼ Update the Firmware 1. Ensure that the ILOM service processor network management port is configured. See the server’s installation guide for instructions. 2. Open an SSH session to connect to the service processor. % ssh root@xxx.xxx.xxx.xxx ... Are you sure you want to continue connecting (yes/no) ? yes ...
5. Type the load command with the path to the new flash image. The load command updates the service processor flash image and the host firmware. The load command requires the following information: ■ IP address of a TFTP server on the network that can access the flash image ■ Full path name to the flash image that the IP address can access The command usage is as follows: load [-script] -source tftp://xxx.xxx.xx.
Cleaning /tmp /var/run /var/lock. Identifying DOC Device Type(G3/G4/H3) ... OK Configuring network interfaces....Internet Systems Consortium DHCP Client V3.0.1 Copyright 2007 Internet Systems Consortium All rights reserved. For info, please visit http://www.isc.org/products/DHCP eth0: config: auto-negotiation on, 100FDX, 100HDX, 10FDX, 10HDX. Listening on LPF/eth0/00:14:4f:3f:8c:af Sending on LPF/eth0/00:14:4f:3f:8c:af Sending on Socket/fallback DHCPDISCOVER on eth0 to 255.255.255.
12 SPARC Enterprise T5140 and T5240 Servers Administration Guide • July 2009
Managing Disks This section describes how to configure and manage RAID disk volumes using the SPARC Enterprise T5140 and T5240 server’s on-board serial attached SCSI (SAS) disk controller, and how to hot-plug a disk.
Related Information ■ “Creating Hardware RAID Volumes” on page 14 ■ “Delete a Hardware RAID Volume” on page 25 Creating Hardware RAID Volumes Caution – Creating RAID volumes using the on-board disk controller destroys all data on member disks.
▼ Create a Hardware Mirrored Volume 1. Verify which hard drive corresponds with which logical device name and physical device name, using the raidctl command: # raidctl Controller: 1 Disk: 0.0.0 Disk: 0.1.0 Disk: 0.2.0 Disk: 0.3.0 Disk: 0.4.0 Disk: 0.5.0 Disk: 0.6.0 Disk: 0.7.0 See “Disk Slot Numbers” on page 35. The preceding example indicates that no RAID volume exists. In another case: # raidctl Controller: 1 Volume:c1t0d0 Disk: 0.0.0 Disk: 0.1.0 Disk: 0.2.0 Disk: 0.3.0 Disk: 0.4.0 Disk: 0.5.0 Disk: 0.
■ FAILED – Indicating that volume should be deleted and reinitialized. This failure can occur when any member disk in an IS volume is lost, or when both disks are lost in an IM volume. The Disk Status column displays the status of each physical disk. Each member disk might be GOOD, indicating that it is online and functioning properly, or it might be FAILED, indicating that the disk has hardware or configuration issues that need to be addressed.
2. Type the following command: # raidctl -c primary secondary The creation of the RAID volume is interactive, by default. For example: # raidctl -c c1t0d0 c1t1d0 Creating RAID volume c1t0d0 will destroy all data on member disks, proceed (yes/no)? yes ... Volume c1t0d0 is created successfully! # As an alternative, you can use the –f option to force the creation if you are sure of the member disks and sure that the data on both member disks can be lost.
3. To check the status of the RAID mirror, type the following command: # raidctl -l c1t0d0 Volume Size Stripe Status Cache RAID Sub Size Level Disk ---------------------------------------------------------------c1t0d0 136.6G N/A SYNC OFF RAID1 0.0.0 136.6G GOOD 0.1.0 136.6G GOOD The preceding example indicates that the RAID mirror is still resynchronizing with the backup drive. The following example shows that the RAID mirror is synchronized and online.
▼ Create a Hardware Mirrored Volume of the Default Boot Device Due to the volume initialization that occurs on the disk controller when a new volume is created, the volume must be configured and labeled using the format(1M) utility prior to use with the Solaris Operating System (see “Configure a Hardware RAID Volume for the Solaris OS” on page 22). Because of this limitation, raidctl(1M) blocks the creation of a hardware RAID volume if any of the member disks currently have a file system mounted.
4. Install the volume with the Solaris OS using any supported method. The hardware RAID volume c1t0d0 appears as a disk to the Solaris installation program. Note – The logical device names might appear differently on your system, depending on the number and type of add-on disk controllers installed.
2. Type the following command: # raidctl -c –r 0 disk1 disk2 ... The creation of the RAID volume is interactive, by default.
Disk: 0.5.0 Disk: 0.6.0 Disk: 0.7.0 4. To check the status of a RAID striped volume, type the following command: # raidctl -l c1t3d0 Volume Size Stripe Status Cache RAID Sub Size Level Disk ---------------------------------------------------------------c1t3d0 N/A 64K OPTIMAL OFF RAID0 0.3.0 N/A GOOD 0.4.0 N/A GOOD 0.5.0 N/A GOOD The example shows that the RAID striped volume is online and functioning. Under RAID 0 (disk striping), there is no replication of data across drives.
1. Start the format utility: # format The format utility might generate messages about corruption of the current label on the volume, which you are going to change. You can safely ignore these messages. 2. Select the disk name that represents the RAID volume that you have configured. In this example, c1t2d0 is the logical name of the volume. # format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c1t0d0
inquiry volname ! quit - show vendor, product and revision - set 8-character volume name - execute , then return 3. Type the type command at the format prompt, then select 0 (zero) to autoconfigure the volume. For example: format> type VAILABLE DRIVE TYPES: 0. Auto configure 1. Quantum ProDrive 80S 2. Quantum ProDrive 105S 3. CDC Wren IV 94171-344 4. SUN0104 5. SUN0207 6. SUN0327 7. SUN0340 8. SUN0424 9. SUN0535 10. SUN0669 11. SUN1.0G 12. SUN1.05 13. SUN1.3G 14. SUN2.1G 15. SUN2.9G 16.
5. Write the new label to the disk using the label command format> label Ready to label disk, continue? yes 6. Verify that the new label has been written by printing the disk list using the disk command. format> disk AVAILABLE DISK SELECTIONS: 0. c1t0d0 /pci@0/pci@0/pci@2/scsi@0/sd@0,0 1. c1t1d0 /pci@0/pci@0/pci@2/scsi@0/sd@1,0 2.
2. To determine the name of the RAID volume, type: # raidctl Controller: 1 Volume:c1t0d0 Disk: 0.0.0 Disk: 0.1.0 ... In this example, the RAID volume is c1t0d0. Note – The logical device names might appear differently on your system, depending on the number and type of add-on disk controllers installed.
3. To delete the volume, type the following command: # raidctl -d mirrored-volume For example: # raidctl -d c1t0d0 Deleting RAID volume c1t0d0 will destroy all data it contains, proceed (yes/no)? yes /pci@0/pci@0/pci@2/scsi@0 (mpt0): Volume 0 deleted. /pci@0/pci@0/pci@2/scsi@0 (mpt0): Physical disk 0 deleted. /pci@0/pci@0/pci@2/scsi@0 (mpt0): Physical disk 1 deleted.
4. To confirm that you have deleted the RAID array, type the following command: # raidctl For example: # raidctl Controller: 1 Disk: 0.0.0 Disk: 0.1.0 ... For more information, see the raidctl(1M) man page. Related Information ■ “Disk Slot Numbers” on page 35 ■ “Hot-Plug a Mirrored Disk” on page 28 ■ “Hot-Plug a Nonmirrored Disk” on page 30 ■ “Creating Hardware RAID Volumes” on page 14 ▼ Hot-Plug a Mirrored Disk 1.
2. To confirm a failed disk, type the following command: # raidctl If the Disk Status is FAILED, then the drive can be removed and a new drive inserted. Upon insertion, the new disk should be GOOD and the volume should be SYNC. For example: # raidctl -l c1t0d0 Volume Size Stripe Status Cache RAID Sub Size Level Disk ---------------------------------------------------------------c1t0d0 136.6G N/A DEGRADED OFF RAID1 0.0.0 136.6G GOOD 0.1.0 136.
5. To check the status of a RAID rebuild, type the following command: # raidctl For example: # raidctl -l c1t0d0 Volume Size Stripe Status Cache RAID Sub Size Level Disk ---------------------------------------------------------------c1t0d0 136.6G N/A SYNC OFF RAID1 0.0.0 136.6G GOOD 0.1.0 136.6G GOOD This example indicates that RAID volume c1t1d0 is resynchronizing.
2. Type the following command: # cfgadm –al For example: # cfgadm –al Ap_Id c1 c1::dsk/c1t0d0 c1::dsk/c1t1d0 c1::dsk/c1t2d0 c1::dsk/c1t3d0 c1::dsk/c1t4d0 c1::dsk/c1t5d0 c1::dsk/c1t6d0 c1::dsk/c1t7d0 usb0/1 usb0/2 usb0/3 usb1/1 usb1/2 usb2/1 usb2/2 usb2/3 usb2/4 usb2/4.1 usb2/4.2 usb2/4.3 usb2/4.
Note – The logical device names might appear differently on your system, depending on the number and type of add-on disk controllers installed. The –al options return the status of all SCSI devices, including buses and USB devices. In this example, no USB devices are connected to the system.
4. Verify that the device has been removed from the device tree. Type the following command: # cfgadm -al Ap_Id c1 c1::dsk/c1t0d0 c1::dsk/c1t1d0 c1::dsk/c1t2d0 c1::dsk/c1t3d0 c1::dsk/c1t4d0 c1::dsk/c1t5d0 c1::dsk/c1t6d0 c1::dsk/c1t7d0 usb0/1 usb0/2 usb0/3 usb1/1 usb1/2 usb2/1 usb2/2 usb2/3 usb2/4 usb2/4.1 usb2/4.2 usb2/4.3 usb2/4.
7. Configure the new hard drive. Type the following command: # cfgadm -c configure Ap-Id For example: # cfgadm -c configure c1::dsk/c1t3d0 The green Activity LED flashes as the new disk at c1t3d0 is added to the device tree. 8. Verify that the new hard drive is in the device tree.
Related Information ■ “Disk Slot Numbers” on page 35 ■ “Hot-Plug a Mirrored Disk” on page 28 Disk Slot Numbers To perform a disk hot-plug procedure, you must know the physical or logical device name for the drive that you want to install or remove. If your system encounters a disk error, often you can find messages about failing or failed disks in the system console. This information is also logged in the /var/adm/messages files.
36 SPARC Enterprise T5140 and T5240 Servers Administration Guide • July 2009
Managing Devices This section contains information about managing devices in the servers, and the multipathing software supported.
▼ Reconfigure a Device Manually The ILOM firmware provides a set Device-Identifier component_state=enabled command, which enables you to reconfigure system devices manually. Use this command to mark the specified device as enabled. 1. “Log In to ILOM” on page 2. 2.
Device Identifiers (Continued) Devices (Continued) /SYS/USBBD/USBnumber USB ports (2-3, located on front of chassis) /SYS/TTYA DB9 Serial Port /SYS/MB/CMPn/MRn/BR/branch_number/CHchannel_number/Ddimm_number CMP (0-1) Riser (0-1) Branch (0-1) Channel (0-1) DIMM (0-3) Related Information ■ “Unconfigure a Device Manually” on page 37 ■ “Reconfigure a Device Manually” on page 38 SPARC Enterprise T5x40 Device Tree The following table shows the correspondence of the SPARC Enterprise T5140 and T5240 ser
Device (as Indicated on Chassis Label) Solaris OS Device Tree USB 0 (rear) USB 1.x USB 2.0 /pci@400/pci@0/pci@1/pci@0/usb@0/storage@1† /pci@400/pci@0/pci@1/pci@0/usb@0,2/storage@1 USB 1 (rear) USB 1.x USB 2.
■ Sun StorageTek™ Traffic Manager is an architecture fully integrated within the Solaris OS (beginning with the Solaris 8 release) that enables I/O devices to be accessed through multiple host controller interfaces from a single instance of the I/O device. Related Information ■ For instructions on how to configure and administer Solaris IP Network Multipathing, consult the IP Network Multipathing Administration Guide provided with your specific Solaris release.
42 SPARC Enterprise T5140 and T5240 Servers Administration Guide • July 2009
Handling Faults The SPARC Enterprise T5140 and T5240 servers provide many ways to find faults, including LEDs, ILOM and POST. For specific information about LEDs, and additional troubleshooting information, refer to the service manual for your server. ■ “Discovering Faults” on page 43 ■ “Bypassing Minor Faults” on page 46 ■ “Clear a Fault” on page 49 Discovering Faults This section contains information about finding system faults using pre-OS tools, including ILOM and POST.
▼ Discover Faults Using ILOM ● Type: -> show /SP/faultmgmt This command displays the fault ID, the faulted FRU device, and the fault message to standard output. The show /SP/faultmgmt command also displays POST results. For example: -> show /SP/faultmgmt /SP/faultmgmt Targets: 0 (/SYS/PS1) Properties: Commands: cd show -> For more information about the show /SP/faultmgmt command, refer to the ILOM guide and the ILOM supplement for your server.
2. At the ILOM -> prompt, type: -> set /SYS keyswitch_state=diag The system is set to run full POST diagnostics on system reset. 3. To return to your normal diagnostic settings after running POST, type: -> set /SYS keyswitch_state=normal Related Information ■ “Discover Faults Using ILOM” on page 44 ■ “Locate the System” on page 45 ■ “Clear a Fault” on page 49 ■ “Bypassing Minor Faults” on page 46 ▼ Locate the System 1.
Bypassing Minor Faults This section includes information about configuring your server to automatically recover from minor faults. ■ “Automatic System Recovery” on page 46 ■ “Enable ASR” on page 47 ■ “Disable ASR” on page 47 ■ “View Information on Components Affected by ASR” on page 48 Automatic System Recovery The system provides for Automatic System Recovery (ASR) from failures in memory modules or PCI cards.
▼ Enable ASR 1. At the -> prompt, type: -> set /HOST/diag mode=normal -> set /HOST/diag level=max -> set /HOST/diag trigger=power-on-reset 2. At the ok prompt, type: ok setenv auto-boot true ok setenv auto-boot-on-error? true Note – For more information about OpenBoot configuration variables, refer to the service manual for your server. 3.
2. To cause the parameter changes to take effect, type: ok reset-all The system permanently stores the parameter change. After you disable the ASR feature, it is not activated again until you re-enable it.
▼ Clear a Fault ● At the -> prompt, type: -> set /SYS/component clear_fault_action=true Setting clear_fault_action to true clears the fault at the component and all levels below it in the /SYS tree.
50 SPARC Enterprise T5140 and T5240 Servers Administration Guide • July 2009
Managing Logical Domains Software SPARC Enterprise servers support the Logical Domains (LDoms) software that is used to create and manage logical domains. The software comprises LDoms-enabling code in the Solaris OS, LDoms-enabling code in System Firmware, and the Logical Domains Manager, which is the command-line interface. See your LDoms documentation for the latest information.
A logical domain is a discrete logical grouping with its own operating systems, resources, and identity within a single computer system. Applications software can run in logical domains. Each logical domain can be created, destroyed, reconfigured, and rebooted independently. There are several roles that logical domains can perform as shown in the following table.
OpenBoot Configuration Variables This section supplies information about variables storing configuration on the SCC. OpenBoot Configuration Variables on the SCC TABLE 1 describes the OpenBoot firmware configuration variables stored in non-volatile memory on the system.
TABLE 1 OpenBoot Configuration Variables Stored on the System Configuration Card (Continued) Variable Possible Values Default Value Description ttya-rts-dtr-off true, false false If true, operating system does not assert rts (request-to-send) and dtr (data-transfer-ready) on serial management port. ttya-ignore-cd true, false true If true, operating system ignores carrier-detect on serial management port.
TABLE 1 OpenBoot Configuration Variables Stored on the System Configuration Card (Continued) Variable Possible Values Default Value Description diag-switch? true, false false If true OpenBoot verbosity is set to maximum If false OpenBoot verbosity is set to minimum error-reset-recovery boot, sync, none boot Command to execute following a system reset generated by an error. network-boot-arguments [protocol, ] [key=value, ] none Arguments to be used by the PROM for network booting.
56 SPARC Enterprise T5140 and T5240 Servers Administration Guide • July 2009
Index Symbols -> commands set /SYS/LOCATE, 45 show /SYS/LOCATE, 45 -> prompt about, 1 -> prompt ways to access, 4 A Activity (disk drive LED), 34 Automatic System Recovery, 46 Automatic System Recovery (ASR) about, 46 disabling, 47 Automatic System Recovery, enable, 47 Automatic System Recovery, view affected components, 48 C cables, keyboard and mouse, 5 cfgadm (Solaris command), 31 cfgadm install_device (Solaris command), cautions against using, 32 cfgadm remove_device (Solaris command), cautions agains
prompt, 4 ILOM commands set /SYS/LOCATE, 45 ILOM overview, 1 ILOM, log in, 2 ILOM, log in to the system console, 3 init (Solaris command), 4 input-device (OpenBoot configuration variable), 5 P PCI graphics card connecting graphics monitor to, 5 frame buffers, 5 physical device name (disk drive), 35 POST diagnostics, run, 44 power off, 8 power on, 7 R K keyboard, attaching, 5 L LDoms (Logical Domains Software), 51 LDoms configuration, 52 LDoms overview, 51 LEDs Activity (disk drive LED), 34 OK-to-Remove (