3.5.1 Matrix Server Administration Guide
Chapter 18: SAN Maintenance 299
Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
Increase the Membership Partition Timeout
Under heavy I/O load, I/O timeouts can occur on membership partition
accesses. The I/O timeouts are reported as ʺSCSI error : <...> return code =
50000ʺ in the file /var/log/messages. The I/O timeouts can cause problems
such as the following:
• Excessive path switching.
• Filesystems appearing to be hung when a node crashes. Large
numbers of I/O timeouts can extend the time it takes to fence the node,
and filesystem operations cannot resume until the node is fenced.
If your site is experiencing the above problems due to I/O timeouts, you
may want to increase the I/O timeout parameter for accessing
membership partitions. You will need to set the timeout on each node
currently in the matrix and on any nodes added to the matrix.
Before setting the timeout, be sure to stop Matrix Server.
To increase the timeout, edit the file /etc/opt/polyserve/mxinit.conf. Locate
the following line in the file:
# sanpulse_start_options = { "--mxinit" };
You will need to add the parameter "-o sdmp_io_timeout=<millisec>"
to the start options. Also remove the comment character (#) from the
beginning of the line:
sanpulse_start_options = { "--mxinit","-o sdmp_io_timeout=
<millisec>" };
<millisec> is the number of milliseconds to be used as the I/O timeout for
accessing membership partitions. The default value is 5,000ms (5
seconds). Be sure to increase the timeout value in small increments, such
as 5,000ms. If the timeout value is too large, more time may be needed for
filesystem recovery after a node failure. The following example sets the
timeout to 10,000ms (10 seconds):
sanpulse_start_options = { "--mxinit","-o sdmp_io_timeout=10000" };
After adding the parameter, you can restart Matrix Server.