PolyServe Matrix Server Event Notification Guide PolyServe Matrix Server 3.
Copyright © 2004-2007 PolyServe, Inc. Use, reproduction and distribution of this document and the software it describes are subject to the terms of the software license agreement distributed with the product (“License Agreement”). Any use, reproduction, or distribution of this document or the described software not explicitly permitted pursuant to the License Agreement is strictly prohibited unless prior written permission from PolyServe has been received.
Contents 1 HP Technical Support HP Storage Website . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 HP NAS Services Website . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Overview Event Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . View Event Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Event Notifier Services . . . . . . . . . . . . . . . . . .
1 HP Technical Support Telephone numbers for worldwide technical support are listed on the following HP website: http://www.hp.com/support. From this website, select the country of origin. For example, the North American technical support number is 800-633-3600. NOTE: For continuous quality improvement, calls may be recorded or monitored.
HP Technical Support 2 HP NAS Services Website The HP NAS Services site allows you to choose from convenient HP Care Pack Services packages or implement a custom support solution delivered by HP ProLiant Storage Server specialists and/or our certified service partners. For more information, see us at http://www.hp.com/hps/storage/ns_nas.html.
2 Overview Matrix Server generates an event message when an error condition or failure occurs or when the status of the matrix changes. To provide an audit trail of matrix operations, a message is also generated when a user requests and is granted or denied authorization to perform a task. Event messages are logged and can be viewed either with the Matrix Event Viewer provided with the PolyServe Management Console or with command-line tools.
Chapter 2: Overview 4 The message is also sent to the Matrix Server mxlogd process, which takes these actions: • Sends the message to the event notifier services configured on the server. If the message has been selected to trigger a notifier service, the appropriate action will take place (send an SNMP trap, send email, or run a script). • Sends the message to all servers in the matrix. The servers, including the server where the event occurred, copy the message into their own matrix logs.
Chapter 2: Overview 5 Event Viewer, and then click on Matrix Server to see the log messages. You can use the options on the Action menu to manipulate the event log. Note that the Windows event log on a particular server includes only the messages that were generated on that server. Event Notifier Services Matrix Server provides the following event notifier services: • SNMP Notifier Service. This service sends SNMP notifications, or traps, to the configured SNMP targets when the selected events occur.
Chapter 2: Overview 6 Alerts Certain events called Alerts are tracked by a Matrix Server component. When the condition causing the event is resolved, Matrix Server closes the Alert. Event messages for Alerts are displayed on the Alerts pane on the PolyServe CFS Management Console and are also written to the event logs in the same manner as other event messages.
3 Event Messages This chapter lists alert messages and corrective actions for Matrix Server, MxFS for CIFS, and MxDB for SQL Server. Alert Descriptions The following table lists Alerts generated by Matrix Server, MxFS for CIFS, and MxDB for SQL Server. ID Message and Corrective Action 101 License is invalid. Matrix Server will be terminated in hour(s) minute(s). Action. The ClusterPulse process has recognized a license violation. This message will be repeated every 15 minutes.
Chapter 3: Event Messages 8 ID Message and Corrective Action 107 Virtual host IP
conflict. Network address is replying to pings. Action. Determine which server owns the IP address assigned to the virtual host. If the server owning the address is configured in the matrix but Matrix Server is down, reboot the server to get the operating system to release the IP address. Otherwise, another device reachable on the network already owns the IP address.Chapter 3: Event Messages ID Message and Corrective Action 4507 Device monitor script configuration resolved. 9 Action. None. Alert 4506 is resolved. 4508 Probe failed: monitor process creation failed. The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server.
Chapter 3: Event Messages ID Message and Corrective Action 4514 Probe failed: monitor probe failed. 10 The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host(s) associated with the monitor. If the virtual host(s) were active on the affected server, they may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host(s).) Action.
Chapter 3: Event Messages ID Message and Corrective Action 4521 Virtual host address has been restored. 11 Action. None. Alert 4520 is resolved. 4522 Probe failed: virtual host address release failed. The monitor probe failed on the specified server. Another attempt will be made to activate the virtual host. 4523 Virtual host address release resolved. Action. None. Alert 4522 is resolved. 4524 Probe failed: Unsupported monitor type ʹʹ.
Chapter 3: Event Messages ID Message and Corrective Action 4528 Probe failed: Monitor type ʹʹ will terminate to load newer function. 12 The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action.
Chapter 3: Event Messages ID Message and Corrective Action 4534 Probe failed: Monitor probe function is NULL. 13 The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host(s) associated with the monitor. If the virtual host(s) were active on the affected server, they may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host(s).) Action.
Chapter 3: Event Messages ID Message and Corrective Action 4540 Probe failed: Partition to monitor is not specified. 14 The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host(s) associated with the monitor. If the virtual host(s) were active on the affected server, they may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host(s).) Action.
Chapter 3: Event Messages ID Message and Corrective Action 4546 Probe failed: Invalid parameters. 15 The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action. Verify that the monitor is configured correctly.
Chapter 3: Event Messages ID Message and Corrective Action 4552 Probe failed: SSL_write error 16 The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action. Verify that the monitor is configured correctly.
Chapter 3: Event Messages ID Message and Corrective Action 4558 Probe failed: Server replied with error code: ʹʹ 17 The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action.
Chapter 3: Event Messages ID Message and Corrective Action 4564 Probe failed: SNMP URL contains invalid OID. 18 The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action.
Chapter 3: Event Messages ID Message and Corrective Action 4570 Probe failed: SNMP Get request failed. 19 The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action.
Chapter 3: Event Messages ID Message and Corrective Action 4576 Probe failed: Filesystem fsync failed. 20 The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host(s) associated with the monitor. If the virtual host(s) were active on the affected server, they may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host(s).) Action.
Chapter 3: Event Messages ID Message and Corrective Action 4582 Probe failed: Filesystem write failed. 21 The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host(s) associated with the monitor. If the virtual host(s) were active on the affected server, they may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host(s).) Action.
Chapter 3: Event Messages ID Message and Corrective Action 4588 Probe failed: Filesystem is not mounted. 22 The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host(s) associated with the monitor. If the virtual host(s) were active on the affected server, they may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host(s).) Action.
Chapter 3: Event Messages ID Message and Corrective Action 4594 Probe failed: . NIS service is not available. 23 The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action.
Chapter 3: Event Messages ID Message and Corrective Action 4600 Probe failed: . NIS RPC service is unknown. 24 The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action.
Chapter 3: Event Messages ID Message and Corrective Action 4606 Probe failed: Socket operation requires a valid IP address. 25 The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action.
Chapter 3: Event Messages ID Message and Corrective Action 4612 Probe failed: Socket receive error: 26 The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action.
Chapter 3: Event Messages ID Message and Corrective Action 4618 Probe failed: Socket connection has timed out after seconds. 27 The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action.
Chapter 3: Event Messages ID Message and Corrective Action 4624 Probe failed: DNS query failed. 28 The monitor probe failed on the specified server. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action. Verify that the monitor is configured correctly.
Chapter 3: Event Messages ID Message and Corrective Action 4630 Probe failed: feature license is unavailable. 29 The monitor feature license is unavailable. The server may no longer be suitable for the virtual host associated with the monitor. If the virtual host was active on the affected server, it may have failed over to another server. (Whether failover occurs is dependent on the configuration of the monitor and the virtual host.) Action.
Chapter 3: Event Messages 30 ID Message and Corrective Action 13905 Reboot ASAP as it stopped matrix network communication at date/time but attempts to exclude it from the SAN were unsuccessful! Rebooting it will allow normal matrix operation to continue. Alternatively, if the server cannot be rebooted, but can be confirmed to have no access to the SAN, run ‘mx server markdown ʹ to restore normal matrix operation. Action.
Chapter 3: Event Messages 31 ID Message and Corrective Action 13909 Membership partitions are corrupt or inaccessible, preventing SAN access. Action. Determine the state of each membership partition. Open the Configure Matrix window and go to the Storage Settings tab, which shows the state of each partition. If a single membership partition is corrupt, use the Repair feature to resilver the partition while the matrix is online.
Chapter 3: Event Messages 32 ID Message and Corrective Action 13915 Membership partition corrupt and must be repaired as soon as possible. Action. Use the mx config mp repair command or the Repair button on the Storage Settings tab of the Configure Matrix Window to repair the partition while the matrix is online. If the matrix is offline, use the mx config mp repair command or resilver the partition with the mprepair utility. 13916 Membership Partition corruption resolved.
Chapter 3: Event Messages ID Message and Corrective Action 13923 is unable to join the matrix because the fencing information it provided does not appear to be valid for this matrix configuration. As a result, this server will not be allowed to mount filesystems. This problem may be due to a configuration error or fencing hardware problem. 33 Action. Check the fencing hardware and the fencing configuration for the server. 13924 Invalid fencing information from resolved. Action.
Chapter 3: Event Messages 34 ID Message and Corrective Action 13929 Membership Partition does not belong to this cluster; cannot use. Action. Determine whether the specified membership partition belongs to this matrix. Use the Storage Settings tab on the Configure Matrix window or the mpdump command to see the membership partitions configured for the matrix. If the partition does belong to the matrix, first verify that no other matrix is using it.
Chapter 3: Event Messages 35 ID Message and Corrective Action 17005 This matrix is unable to take control of SAN, because the servers are unable to perform fencing operations, possibly due to a networking or fencing hardware failure or misconfiguration. As a result, some or all filesystem operations may be paused throughout the matrix. In addition, filesystem mounts and unmounts and disk imports and deports can not be performed. Action.
Chapter 3: Event Messages ID Message and Corrective Action 17010 This matrix takes control of SAN failure resolved. 36 Action. None. Alert 17009 is resolved. 17011 Singleton matrix unable to take control of SAN. Possibly this server has not been added to the matrix or has been deleted from the matrix, or possibly a network failure has partitioned this server from the rest of the matrix. As a result, some or all filesystem operations may be paused throughout the matrix.
Chapter 3: Event Messages ID Message and Corrective Action 17016 Inaccessible majority of membership partitions resolved. 37 Action. None. Alert 17015 is resolved. 17017 Membership partition is unwritable, possibly due to a SAN or storage hardware failure. If other membership partitions become inaccessible, Matrix Server’s ability to recover from a server failure will be compromised. Action. None of the servers in the matrix can write to the specified membership partition.
Chapter 3: Event Messages ID Message and Corrective Action 17024 Stalled server waiting for filesystem locks resolved. 38 Action. None. Alert 17023 is resolved. 17025 Filesystem suspended. Action. The filesystem has been suspended by the Matrix Server psfssuspend command or by a third-party application such as a backup utility. Writes to the filesystem will be blocked until the filesystem is resumed.
Chapter 3: Event Messages 39 ID Message and Corrective Action 17031 has lost a significant portion of its SAN access, including access to all the membership partitions, possibly due to a SAN hardware failure. As a result, this server is ineligible to become the matrix ADM. Action. The specified server is unable to write to any of the membership partitions. Ensure that the server can access the membership partitions and also has write access to them.
Chapter 3: Event Messages ID Message and Corrective Action 40504 Failure to stop NT service . 40 Action. Check the matrix log and the Service event log for a possible cause of the failure. If the shared storage is mounted, check the ERRORLOG for the instance on the SAN. If you are unable to resolve this problem, contact HP Support. 40505 NT service stop failure resolved. Action. None. Alert 40504 is resolved. 40506 Failure to shutdown NT service in order to start monitoring.
Chapter 3: Event Messages ID Message and Corrective Action 40514 Cannot create Metakey . 41 Action. There is a problem with the MxDB for SQL Server registry replicator. Contact HP Support for assistance. 40515 Metakey create failure resolved. Action. None. Alert 40514 is resolved. 40516 Cannot remove Metakey . Action. There is a problem with the MxDB for SQL Server registry replicator. Contact HP Support for assistance. 40517 Metakey remove failure resolved. Action. None.
Chapter 3: Event Messages ID Message and Corrective Action 50002 configuration error: the share path is not set. 42 Action. A Matrix File Share failure occurred on the specified server. The file share may not be accessible on that server. Verify that the Matrix File Share is configured correctly. 50003 configuration: an unset share path failure is resolved. Action. None. Alert 50002 is resolved.
Chapter 3: Event Messages ID Message and Corrective Action 50012 probe volume failed. 43 Action. A Matrix File Share failure occurred on the specified server. The file share may not be accessible on that server. Verify that the Matrix File Share is configured correctly. 50013 A probe volume failure is resolved. Action. None. Alert 50012 is resolved. 50014 Cannot add the shared resource: . Action. A Matrix File Share failure occurred on the specified server.
Chapter 3: Event Messages ID Message and Corrective Action 50022 Share of subdirectory failed: . Failure 1 of . 44 Action. A Matrix File Share failure occurred on the specified server. The file share may not be accessible on that server. Verify that the Matrix File Share is configured correctly. 50023 Share of subdirectory failure resolved. Action. None. Alert 50022 is resolved.
Chapter 3: Event Messages ID Message and Corrective Action 50034 configuration error: null comment failure. 45 Action. A Virtual File Share failure occurred on the specified server. The server may no longer be suitable for the Virtual File Server associated with the Virtual File Share. If the Virtual File Server was active on the affected server, it may have failed over to another server.
Chapter 3: Event Messages ID Message and Corrective Action 50041 probe: an out of memory failure is resolved. 46 Action. None. Alert 50040 is resolved. 50042 probe volume ʹʹ failed. Action. A Virtual File Share failure occurred on the specified server. The server may no longer be suitable for the Virtual File Server associated with the Virtual File Share.
Chapter 3: Event Messages ID Message and Corrective Action 50048 Share name collision between the shared and monitored resources attributes. 47 Action. A Virtual File Share failure occurred on the specified server. The server may no longer be suitable for the Virtual File Server associated with the Virtual File Share. If the Virtual File Server was active on the affected server, it may have failed over to another server.
Chapter 3: Event Messages 48 ID Message and Corrective Action 50054 probe synchronizing subdirectory changes ( of n). Action. A Virtual File Share failure occurred on the specified server. The server may no longer be suitable for the Virtual File Server associated with the Virtual File Share. If the Virtual File Server was active on the affected server, it may have failed over to another server.