Sun StorEdge™ Availability Suite 3.2 Software Troubleshooting Guide Sun Microsystems, Inc. www.sun.com Part No. 817-3752-10 December 2003, Revision 51 Submit comments about this document at: http://www.sun.
Copyright© 2003 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved. Sun Microsystems, Inc. has intellectual property rights relating to technology embodied in this product. In particular, and without limitation, these intellectual property rights may include one or more of the U.S. patents listed at http://www.sun.com/patents and one or more additional patents or pending patent applications in the U.S. and in other countries.
Contents Preface 1. v Point-in-Time Copy Software Troubleshooting Tips 1 Troubleshooting Checklist Checking Log Files 1 2 Improving Performance 2 Safeguarding the VTOC Information 2. Remote Mirror Software Troubleshooting Tips 5 Troubleshooting Checklist 6 3 Troubleshooting Log Files and Services Checking Log Files 6 7 Checking the /etc/nsswitch.
Correcting Common User Errors 13 Enabled Software on Only One Host Volumes Are Inaccessible 13 Wrong Volume Set Name Specified Accommodating Memory Requirements 3. iv Error Messages 13 14 16 19 Sun StorEdge Availability Suite 3.
Preface Sun StorEdge Availability Suite 3.2 Software Troubleshooting Guide helps users solve common problems that might arise when using the Sun StorEdge™ Availability Suite 3.2 software. Before You Read This Book To use the information in this document, you must have thorough knowledge of the topics discussed in these books: ■ Sun StorEdge Availability Suite 3.2 Point-in-Time Copy Software Administration and Operations Guide ■ Sun StorEdge Availability Suite 3.
How This Book Is Organized This book includes the following chapters: Chapter 1 helps to solve problems associated with the point-in-time copy software. Chapter 2 helps to solve problems associated with the remote mirror software. Chapter 3 provides an alphabetical list of error messages from all sources associated with the Sun StorEdge Availability Suite software.
Shell Prompts Shell Prompt C shell machine-name% C shell superuser machine-name# Bourne shell and Korn shell $ Bourne shell and Korn shell superuser # Typographic Conventions Typeface1 Meaning Examples AaBbCc123 The names of commands, files, and directories; on-screen computer output Edit your.login file. Use ls -a to list all files. % You have mail.
Related Documentation Application Title Part Number Man pages sndradm iiadm dsstat kstat svadm N/A Latest release information Sun StorEdge Availability Suite 3.2 Software Release Notes 817-2782 Sun Cluster 3.0 and Sun StorEdge Software Release Note Supplement 816-5128 Sun StorEdge Availability Suite 3.2 Software Installation Guide 817-2783 SunATM 3.0 Installation and User’s Guide SunATM 4.
Accessing Sun Documentation You can view, print, or purchase a broad selection of Sun documentation, including localized versions, at: http://www.sun.com/documentation Contacting Sun Technical Support If you have technical questions about this product that are not answered in this document, go to: http://www.sun.com/service/contacting Sun Welcomes Your Comments Sun is interested in improving its documentation and welcomes your comments and suggestions. You can submit your comments by going to: http://www.
x Sun StorEdge Availability Suite 3.
CHAPTER 1 Point-in-Time Copy Software Troubleshooting Tips This chapter describes how to avoid or troubleshoot problems that might occur when using the point-in-time copy software. The chapter includes the following topics: ■ “Troubleshooting Checklist” on page 1 ■ “Checking Log Files” on page 2 ■ “Improving Performance” on page 2 ■ “Safeguarding the VTOC Information” on page 3 Troubleshooting Checklist This table shows the troubleshooting checklist and related sections.
Checking Log Files You can check the status of the point-in-time copy software by examining two system log files: ■ /var/opt/SUNWesm/ds.log The /var/opt/SUNWesm/ds.log file contains timestamped messages about the point-in-time copy software, including error messages and informational messages.
Safeguarding the VTOC Information Caution – When creating shadow volume sets, do not create shadow or bitmap volumes using partitions that include cylinder 0. Data loss might occur. The Solaris system administrator must be knowledgable about the virtual table of contents (VTOC) that is created on raw devices by the Solaris operating system. The creation and updating of a physical disk’s VTOC is a standard function of the Solaris operating system.
4 Sun StorEdge Availability Suite 3.
CHAPTER 2 Remote Mirror Software Troubleshooting Tips This section describes how to avoid or troubleshoot any problems might occur when using the remote mirror software. The following topics are described. ■ “Troubleshooting Checklist” on page 6 ■ “Troubleshooting Log Files and Services” on page 6 ■ “Checking the Integrity of the Link” on page 10 ■ “Correcting Common User Errors” on page 13 Note – The Sun StorEdge Availability Suite 3.
Troubleshooting Checklist This table shows the troubleshooting checklist and related sections. TABLE 2-1 Troubleshooting Checklist Step For Instructions 1. Check for installation errors. Sun StorEdge Availability Suite 3.2 Software Installation Guide 2. Check that /dev/rdc is created after reboot. “Checking That the rdc Service Is Running” on page 8 “If the /dev/rdc Link Is Not Created” on page 9 3. Check that the sndrd daemon is running. Sun StorEdge Availability Suite 3.
Checking Log Files Check the following files to troubleshoot problems: ■ /var/opt/SUNWesm/ds.log The /var/opt/SUNWesm/ds.log file contains timestamped messages about the software. For example: Aug 20 19:13:55 scm: scmadm cache enable succeeded Aug 20 19:13:55 ii: iiboot resume cluster tag Aug 20 19:13:58 sndr: sndrboot -r first.atm /dev/vx/rdsk/rootdg/vol5 /dev/vx/rdsk/ rootdg/bm6 second.atm /dev/vx/rdsk/rootdg/vol7 /dev/vx/rdsk/rootdg/bm7 Successful Aug 20 19:13:58 sndr: sndrboot -r first.
Checking the /etc/nsswitch.conf File If entries in the /etc/nsswitch.conf are not configured correctly, you might encounter these problems: ■ If the hosts: entry is incorrect, volume sets not resume after a reboot. ■ If the services: entry is incorrect, the rdc service might not activate and no data is replicated. Note – The services port number must be the same between all interconnected remote mirror host systems. When the hosts: and services: entries are included in the /etc/nsswitch.
where: ■ -T tcp specifies the transport that the service uses. ■ hostname is the name of the machine where the service is running. If the service is not running, this message is displayed: rpcinfo: RPC: Program not registered If you see this message, it is possible that the /etc/nsswitch.conf services: entry is incorrectly configured. See “Checking the /etc/nsswitch.conf File” on page 8. ■ netstat This messages shows that the service is running: # netstat -a|grep rdc *.rdc *.* *.rdc *.* *.
■ The /etc/name_to_major file is missing an entry for the /dev/rdc pseudo-link. This example shows a valid entry (the number following rdc can be any number): # grep rdc /etc/name_to_major rdc 239 ■ The /usr/kernel/drv/rdc.conf file is incomplete. This example shows a valid entry: # grep pseudo /usr/kernel/drv/rdc.conf name="rdc" parent="pseudo"; Checking the Integrity of the Link After you determine that the rdc service is ready, check the integrity of the TCP/IP link.
Testing with ifconfig Use the ifconfig command to make sure that the network interface is configured and running correctly. This example output shows all the interfaces that are configured and running: # ifconfig -a ba0: flags=1000843 mtu 9180 index 1 inet 192.9.201.10 netmask ffffff00 broadcast 192.2.201.255 ether 8:0:20:af:8e:d0 lo0: flags=1000849 mtu 8232 index 2 inet 127.0.0.
In the first example, the command is issued from the primary host nws822 to the secondary host nws350. The network interface is hme0 and the port used by the rdc service is reported.
Correcting Common User Errors This section describes user errors encountered often when using the software. ■ “Enabled Software on Only One Host” on page 13 ■ “Volumes Are Inaccessible” on page 13 ■ “Wrong Volume Set Name Specified” on page 14 Enabled Software on Only One Host New users sometimes forget to issue the sndradm -e enable command on both the primary host and the secondary host.
This example shows the newfs -N command completing successfully: # newfs -N /dev/vx/rdsk/rootdg/test0 /dev/vx/rdsk/rootdg/tony0: 2048000 sectors in 1000 cylinders of 32 tracks, 64 sectors 1000.0MB in 63 cyl groups (16 c/g, 16.
For example, this command updates the volume on the secondary host calamari from the primary host volume: # sndradm -un calamari:/dev/vx/rdsk/rootdg/tony1 To correctly display the volume set name, use the sndradm -p command on the primary host. See “To Find the Volume Set Name” on page 16. Using the dsstat Command Incorrectly An administrator might use the dsstat(1M) command instead of sndradm -p to find the volume set name.
▼ To Find the Volume Set Name 1. If you are unsure of the volume set name, type the following command from the primary host: # sndradm -p /dev/vx/rdsk/rootdg/tony1 -> calamari:/dev/vx/rdsk/rootdg/tony1 Running Startup Script Out of Order The scripts to configure the network interface must run before the Availability Suite’s startup script.
The order of write operations must be maintained within a group. Therefore, these out of order requests must be stored in memory on the secondary host until the missing request comes in and completes. The secondary host can store up to the hard-coded limit of 64 requests per group. Exceeding 64 stored requests stalls the primary host from issuing any more requests. This hard limit is applied only to the number of possible outstanding requests, not the size of their payload.
18 Sun StorEdge Availability Suite 3.
CHAPTER 3 Error Messages Error messages during the installation process are described in the Sun StorEdge Availability Suite 3.2 Software Installation Guide. Solaris error messages related to the Sun StorEdge Availability Suite software are described in .... TABLE 3-1 lists Sun StorEdge Availability Suite 3.2 error messages in alphabetical order. The error messages come from the following sources: ■ PITC: From the point-in-time copy software.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning Abort failed PITC iiadm could not abort a copy or update operation on a set. Possible errors: EFAULT: The kernel module tried to read out-ofbounds. File a bug against iiadm. ENOMEM: The kernel module ran out of memory. DSW_EEMPTY: No set was specified. DSW_ENOTFOUND: The specified set does not exist.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning Bitmap reconfig failed %s:%s Kernel A request to reconfigure the bitmap on the local host has failed. This can happen for two reasons: • The old bitmap cannot be read from to obtain needed information. • The new bitmap cannot be reserved because the volume is not accessible or is already in use. Verify that the new bitmap volume is accessible and is not already in use.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning cannot find SNDR set : in config RM Remote mirror set cannot be found in the configuration database. The set is not configured. Check the entry for errors. Cannot reconfig %s:%s to %s:%s, Must be in logging mode Kernel An operation has been requested that requires the remote mirror set to be in logging mode.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning Cannot enable %s:%s ==> %s:%s, secondary in use in another set Kernel A set being enabled or resumed has a secondary volume that is already in use as a secondary volume for another remote mirror set. A volume cannot be enabled as a secondary volume if it is already in use as a secondary volume by another remote mirror set.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning Copy failed PITC A copy or update operation could not be initiated. Possible errors: EFAULT: The kernel module tried to read out-ofbounds. File a bug against iiadm. ENOMEM: The kernel module ran out of memory. DSW_EEMPTY: No set was specified on the command line. DSW_ENOTFOUND: The specified set could not be found in the kernel.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning Create overflow failed PITC An overflow volume couldn’t be initialized. Possible errors: EFAULT: The kernel module tried to read out-ofbounds. File a bug against iiadm. ENOMEM: The kernel module ran out of memory. DSW_EEMPTY: No overflow volume was specified. DSW_EINUSE: The volume that was specified is already being used by the point-in-time copy software in another capacity.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning Disable pending on diskq %s, try again later Kernel A request to disable the disk queue is already in progress. Verify that the previous request has completed successfully. If it has, this request is no longer valid. If it has not, wait for it to complete unsuccessfully before attempting to disable the disk queue.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning disk queue volume must not match any primary SNDR volume or bitmap RM The disk queue volume specified for the reconfiguration operation is already in use by the remote mirror software as a data volume or bitmap volume. don't understand shadow type PITC The iiadm -e command expected dep or ind.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning Enable failed PITC Could not enable volume. Possible errors: EFAULT: The kernel module tried to read out-ofbounds. File a bug against iiadm. ENOMEM: The kernel module ran out of memory. DSW_ESHUTDOWN: The kernel module is in the process of shutting down the point-in-time copy software. No new sets can be enabled. DSW_EEMPTY: One of the volumes' names (master, shadow, bitmap) is blank.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning Failed to detach overflow volume PITC iiadm had a problem detaching the overflow volume from a set. Possible errors: EFAULT: The kernel module tried to read out-ofbounds. File a bug against iiadm. ENOMEM: The kernel module ran out of memory. DSW_EEMPTY: No set was specified to detach from. DSW_ENOTFOUND: The set to detach from does not exist.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning hostname tag exceeds CFG_MAX_BUF PITC Because CFG_MAX_BUF is 1k, this message is not expected to be reported. Import failed PITC Could not import shadow volume. Possible errors: EFAULT: The kernel module tried to read out-ofbounds. File a bug against iiadm ENOMEM: The kernel module ran out of memory.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning Join failed PITC Could not join shadow volume back to the set. Possible errors: EFAULT: The kernel module tried to read out-ofbounds. File a bug against iiadm. ENOMEM: The kernel module ran out of memory. DSW_EEMPTY: A volume was missing on the command line. DSW_ENOTFOUND: The set could not be found in the kernel.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning Memory allocation failure PITC iiadm ran out of memory. Must be super-user to execute Kernel The user issued a remote mirror command but does not have superuser privileges. All remote mirror commands require superuser privileges. must specify full set details for enable command RM The user attempted to enable a set using the shost:svol format.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning Overflow list access failure PITC iiadm could not get a list of overflow volumes from the kernel. Possible errors: EFAULT: The kernel module tried to read out-ofbound. File a bug against iiadm. ENOMEM: The kernel module ran out of memory.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning Reverse sync needed, cannot sync %s:%s ==> %s:%s Kernel The user requested a forward sync operation for a remote mirror set which needs a reverse sync. This occurs when a previous reverse sync does not complete successfully or because the primary volume was damaged and had to be replaced. Issue a reverse sync for the set.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning Shadow group %s is suspended PITC The user attempted to perform a copy or update operation on a group with one or more suspended sets. The %s parameter identifies the first set found in the group that is suspended. Shadow group suspended PITC The user attempted to perform a copy or update operation on a suspended set.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning SNDR set does not have a disk queue RM Set does not have a disk queue attached when attempting either a queue remove operation or a queue replace operation. SNDR: The volume ’’ has been configured previously as ’’. Re-enter command with the latter name. RM The user attempted to enable a set in which the volume was already enabled, but with a different name.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning The volume %s is already in use Kernel The data volume in the remote mirror set is already in use as a bitmap volume or a disk queue volume. Use a different data volume. Too many volumes given for update PITC iiadm ran out of memory. Unable to access bitmap PITC During an enable operation, iiadm tried to validate the bitmap device, but could not get access to it.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning unable to determine hostname: RM Could not determine the host name of the system. unable to determine IP addresses for either host or host RM The IP address for either the primary host or the secondary host could not be determined.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning unable to obtain unique set id for : RM Lookup of the set ID in the configuration database for this set has failed. Unable to open bitmap file RM The volume specified for the bitmap could not be opened. The volume might not exist or is already in use by another program.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning Update failed PITC One or more volumes in a group copy or update command failed. Possible errors: EFAULT: The kernel module tried to read out-ofbounds. File a bug against iiadm. ENOMEM: The kernel module ran out of memory. EINVAL: User is performing a shadow-to-master copy, but two or more shadows are of the same master.
TABLE 3-1 Error Messages for the Sun StorEdge Availability Suite 3.2 Software (Continued) Error Message From Meaning Volumes are not in same disk group PITC iiadm detected that the master, shadow, and bitmap volumes are not all in the same cluster device group, as required by the point-in-time copy software. volume "" is not part of a disk group, please specify resource ctag RM The volume is not being managed by SunCluster.
Related Error Messages The Solaris configuration administration utility, cfgadm, reports an error when it is used on systems where the Sun StorEdge Availability Suite software is installed. The error occurs because a process does not suspend properly so that the cfgadm operation can proceed.