HP StorageWorks PolyServe Matrix Server event notification guide HP PolyServe Matrix Server 4.0.
Legal and notice information © Copyright 2007, 2010 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. The information contained herein is subject to change without notice.
Contents 1 Overview ......................................................................... 4 Event logs ........................................................................................................... View event logs ............................................................................................ Event notifier services ........................................................................................... Event messages .........................................................
1 Overview Matrix Server generates an event message when an error condition or failure occurs or when the status of the matrix changes. To provide an audit trail of matrix operations, a message is also generated when a user requests and is granted or denied authorization to perform a task. Event messages are logged and can be viewed either with the Matrix Event Viewer provided with the HP PolyServe Management Console or with command-line tools.
View event logs The matrix log is not intended to be read directly. Instead, use these methods: • The Matrix Event Viewer. This tool, available on the HP PolyServe Management Console, shows the messages in the matrix log on a particular server. • The mx server viewevents command. This command is run from the command line and allows you to filter the events in the matrix log on the specified server. You can also limit the events displayed by the command to a particular time range.
Event messages Each message is labeled with a unique identifier and provides the following information: • A description of the event, including the time it occurred. • The location of the event, such as the IP address of a server. • The source of the event, such as a Matrix Server component. • A severity level that indicates the relative importance of the event. • A category such as Startup or Filesystem that can be used for sorting when viewing the matrix log.
• Audit Success. A user requested and was granted authorization to perform a task. • Audit Failure. A user requested and was denied authorization to perform a task.
Overview
2 Event messages This chapter lists alert messages and corrective actions for Matrix Server, HP PolyServe Software for Microsoft SQL Server, and HP PolyServe Software for Windows File Sharing. ID Message and Corrective Action 101 License is invalid. Matrix Server will be terminated in hour(s) minute(s). Action. The ClusterPulse process has recognized a license violation. This message will be repeated every 15 minutes.
ID Message and Corrective Action 109 Communication to server is down. Action. A server configured in the matrix is not communicating with the rest of the matrix. If Matrix Server is stopped on the server, restart the product. Otherwise, check for network issues causing the problem or remove the server from the matrix. 110 Communication to server restored. Action. None. Alert 109 is resolved.
ID Message and Corrective Action 4509 Monitor process creation resolved. Action. None. Alert 4508 is resolved. 4510 Probe failed: monitor process creation failed. The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host(s) associated with the monitor. If the virtual host(s) were active on the affected server, they may have failed over to another server.
ID Message and Corrective Action 4516 Probe failed: monitor process timeout. The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action. Verify that the monitor is configured correctly.
ID Message and Corrective Action 4524 Probe failed: Unsupported monitor type ''. The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action. Verify that the monitor is configured correctly.
ID Message and Corrective Action 4530 Probe failed: Monitor type '' will terminate to load newer function. The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host(s) associated with the monitor. If the virtual host(s) were active on the affected server, they may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host(s).) Action.
ID Message and Corrective Action 4536 Probe failed: Gateway monitor address ' ' could not be resolved to an IP address. The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host(s) associated with the monitor. If the virtual host(s) were active on the affected server, they may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host(s).) Action.
ID Message and Corrective Action 4542 Probe failed: Partition could not be accessed. The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host(s) associated with the monitor. If the virtual host(s) were active on the affected server, they may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host(s).) Action. Verify that the monitor is configured correctly.
ID Message and Corrective Action 4548 Probe failed: Connect failed. The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action. Verify that the monitor is configured correctly.
ID Message and Corrective Action 4554 Probe failed: Invalid reply from server: '' was expecting reply to contain integer result code. The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action.
ID Message and Corrective Action 4560 Probe failed: Invalid reply from server: ' ' was expecting reply to contain '' The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action.
ID Message and Corrective Action 4566 Probe failed: Unable to open SNMP session. The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action. Verify that the monitor is configured correctly.
ID Message and Corrective Action 4573 SNMP Get request timeout is resolved. Action. None. Alert 4572 is resolved. 4574 Probe failed: Filesystem could not be opened. The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host(s) associated with the monitor. If the virtual host(s) were active on the affected server, they may have failed over to another server.
ID Message and Corrective Action 4580 Probe failed: Filesystem directory creation failed. The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host(s) associated with the monitor. If the virtual host(s) were active on the affected server, they may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host(s).) Action. Verify that the monitor is configured correctly.
ID Message and Corrective Action 4586 Probe failed: Insufficient process memory. The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host(s) associated with the monitor. If the virtual host(s) were active on the affected server, they may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host(s).) Action. Verify that the monitor is configured correctly.
ID Message and Corrective Action 4592 Probe failed: . NIS service request timed out. The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action. Verify that the monitor is configured correctly.
ID Message and Corrective Action 4598 Probe failed: . NIS server address is invalid. The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action. Verify that the monitor is configured correctly.
ID Message and Corrective Action 4605 NT service access is resolved. Action. None. Alert 4604 is resolved. 4606 Probe failed: Socket operation requires a valid IP address. The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server.
ID Message and Corrective Action 4612 Probe failed: Socket receive error: The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action. Verify that the monitor is configured correctly.
ID Message and Corrective Action 4618 Probe failed: Socket connection has timed out after seconds. The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action. Verify that the monitor is configured correctly.
ID Message and Corrective Action 4624 Probe failed: DNS query failed. The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action. Verify that the monitor is configured correctly.
ID Message and Corrective Action 4631 Probe feature license availability is resolved. Action. None. Alert 4630 is resolved. 4632 Probe failed: feature license is unavailable. The monitor feature license is unavailable. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server.
ID Message and Corrective Action 13905 Reboot ASAP as it stopped matrix network communication at date/time but attempts to exclude it from the SAN were unsuccessful! Rebooting it will allow normal matrix operation to continue. Alternatively, if the server cannot be rebooted, but can be confirmed to have no access to the SAN, run ‘mx server markdown ' to restore normal matrix operation. Action. Matrix Server cannot fence a server that is no longer communicating with the matrix.
ID Message and Corrective Action 13910 Inaccessible membership partition resolved. Action. None. Alert 13909 is resolved. 13911 DiskImportError: unable to access imported disk /psd. When the matrix configuration was imported to the server, it was not able to access the specified disk. The filesystems on the disk were not mounted on the server. Action.
ID Message and Corrective Action 13917 IllegalDisks: One or more imported disks has more than 31 partitions. Matrix Server only supports 31 partitions per disk. Some volume operations may fail on partitions higher than this limit. See the Administrator's Guide for further information. Action. As of Matrix Server 3.4, the maximum number of partitions on a disk or LUN is limited to 31. If you upgraded to Matrix Server 3.
ID Message and Corrective Action 13925 One or membership partitions are too small to allow the mxds data store to be created. As a result, some Matrix Server functionality will not be available. To correct this problem, replace the affected membership partitions with larger partitions. Action. Open the Configure Matrix window and go to the Storage Settings tab, which shows the size of each membership partition.
ID Message and Corrective Action 13933 FenceAgent : no response to queries from . Fabric fencing is configured and a node has lost network access or the FibreChannel switch is unresponsive. Action. Possible causes of the error are: • The server has experienced a total loss of network access to the FibreChannel switch(es). • A FibreChannel switch has become unresponsive to snmp requests. • The dependent r/w community string has been changed or deleted.
ID Message and Corrective Action 17002 Singleton matrix takes control of SAN failure resolved. Action. None. Alert 17001 is resolved. 17003 This matrix unable to take control of SAN, because another matrix that includes NN.NN.NN.NN currently controls the SAN. Possibly a networking failure or misconfiguration has partitioned these servers from the servers that control the SAN, or possibly this matrix has been misconfigured to share membership partitions with another matrix. Action.
ID Message and Corrective Action 17008 This matrix takes control of SAN failure resolved. Action. None. Alert 17007 is resolved. 17009 This matrix unable to take control of SAN, because a majority of the membership partitions cannot be written or are corrupt, possibly due to a SAN hardware failure or misconfiguration and/or because servers have been excluded from the SAN. As a result, some or all filesystem operations may be paused throughout the matrix.
ID Message and Corrective Action 17015 Majority of membership partitions are unwritable, possibly due to a SAN or storage hardware failure. As a result, disk imports and deports cannot be done, and some servers may be unable to mount filesystems. In addition, Matrix Server’s ability to recover from a future server failure is compromised. Such a failure would leave Matrix Server no option but to pause some or all filesystems throughout the matrix to preserve filesystem integrity. Action.
ID Message and Corrective Action 17021 Operator error may have caused filesystem corruption! returned to the matrix without being rebooted, even though the operation verified that was down. It is recommended that all filesystems that had mounted be checked for corruption. This alert will display for 48 hours. Action. Run psfscheck on all filesystems that were mounted on the affected server. See the Matrix Server command reference guide for information about this command.
ID Message and Corrective Action 17029 NN.NN.NN.NN has lost a significant portion of its SAN access, including access to all the membership partitions, possibly due to a SAN hardware failure. Action. The specified server, which was the matrix ADM, is unable to write to any of the membership partitions. Ensure that the server can access the membership partitions and also has write access to them. Also check for hardware problems that can limit access to the partitions.
ID Message and Corrective Action 40501 Inaccessible volume failure resolved. Action. None. Alert 40500 is resolved. 40502 Failure to start NT service . Action. Check the matrix log and the Service event log for a possible cause of the failure. If the shared storage is mounted, check the ERRORLOG for the instance on the SAN. If you are unable to resolve this problem, contact HP Support. 40503 NT service start failure resolved. Action. None. Alert 40502 is resolved.
ID Message and Corrective Action 40511 Failure enabling metakey resolved. Action. None. Alert 40510 is resolved. 40512 Metakey does not exist . Action. There is a problem with the HP PolyServe Software for Microsoft SQL Server registry replicator. Contact HP Support for assistance. 40513 Metakey find failure resolved. Action. None. Alert 40512 is resolved. 40514 Cannot create Metakey . Action.
ID Message and Corrective Action 40522 Service failed to enter into maintenance mode. Action. Contact HP Support for assistance. 40523 Entered maintain mode. Action. None. Alert 40522 is resolved. 50000 configuration error: the share resource is not set. Action. A Matrix File Share failure occurred on the specified server. The file share may not be accessible on that server. Verify that the Matrix File Share is configured correctly.
ID Message and Corrective Action 50008 configuration error: Null users failure. Action. A Matrix File Share failure occurred on the specified server. The file share may not be accessible on that server. Verify that the Matrix File Share is configured correctly. 50009 configuration: a null users failure is resolved. Action. None. Alert 50008 is resolved. 50010 probe failed: out of memory. Action.
ID Message and Corrective Action 50018 Share name collision between the shared and monitored resources attributes. Action. A Matrix File Share failure occurred on the specified server. The file share may not be accessible on that server. Verify that the Matrix File Share is configured correctly. 50019 Share name collision between the shared and monitored resources attributes resolved. Action. None. Alert 50018 is resolved. 50020 Cannot access the shared resource: . Action.
ID Message and Corrective Action 50030 configuration error: the share resource is not set. Action. A Virtual File Share failure occurred on the specified server. The server may no longer be suitable for the Virtual File Server associated with the Virtual File Share. If the Virtual File Server was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and Virtual File Server.
ID Message and Corrective Action 50037 configuration: a null max_users failure is resolved. Action. None. Alert 50036 is resolved. 50038 configuration error: null users failure. Action. A Virtual File Share failure occurred on the specified server. The server may no longer be suitable for the Virtual File Server associated with the Virtual File Share. If the Virtual File Server was active on the affected server, it may have failed over to another server.
ID Message and Corrective Action 50044 Cannot add the shared resource . Action. A Virtual File Share failure occurred on the specified server. The server may no longer be suitable for the Virtual File Server associated with the Virtual File Share. If the Virtual File Server was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and Virtual File Server.
ID Message and Corrective Action 50051 Failure accessing shared resource resolved. Action. None. Alert 50050 is resolved. 50052 Share of subdirectory failed. failure 1 of . Action. A Virtual File Share failure occurred on the specified server. The server may no longer be suitable for the Virtual File Server associated with the Virtual File Share. If the Virtual File Server was active on the affected server, it may have failed over to another server.
Event messages
A Support and other resources HP technical support For worldwide technical support information, see the HP support website: http://www.hp.