HP StorageWorks Scalable File Share Release Notes - Version 2.3

New and changed features in HP SFS Version 2.31–8

1.4.3 NFS clients kernel panic trying to create files larger than physical mem size

This is a known Red Hat Enterprise Linux 4 bug #163555. It is more likely to occur with Lustre filesystems, as

Lustre is often used to store huge files.

See https://bugzilla.redhat.com/show_bug.cgi?id=163555

for details.

Workaround to prevent the bug: Run the following command on the client:

echo 100 > /proc/sys/vm/lower_zone_protection

1.4.4 Start filesystem will hang on badly initialized InfiniBand fabrics

This is likely to happen on new systems with large InfiniBand fabrics, where the IB switch has been incorrectly

initialized. The sfsmgr start filesystem will hang for a long time and eventually report errors,

despite the fact that all system configuration commands apparently succeeded, and ibstat succeeds on every

node.

To avoid such problems, HP strongly recommends that you run the sfsmgr syscheck command before

starting a filesystem for the first time. syscheck will quickly verify that every node can communicate with

every other node over all interconnects, and report an error if not.

1.4.5 Propagating contents of /etc/modprobe.conf(.lustre*) into the XC

systemimage

During the installation and configuration of the SFS client RPMs on an XC headnode, edits may have been

made to the /etc/modprobe.conf(.lustre*) files. These edits must be propagated into the

systemimage so that the remaining XC nodes can also benefit during imaging.

On an XC cluster, files such as /etc/modprobe.conf(.lustre*), and /etc/sfstab are only

populated into the systemimage during the "inaugural" golden image creation. Subsequent "updates" of

the golden image will not re-populate certain special files with new content.

Diff'ing the two systemimager exclusion files on the XC headnode produces an understanding of which files

are excluded from the inaugural/base image versus the "extra" set of files that are excluded from subsequent

updates to that image:

/opt/hptc/systemimager/etc/base_exclude_file

/opt/hptc/systemimager/etc/updgi_exclude_file

If you have already made manual edits to your systemimage (for example, elilo aspects, and so on), then

you can choose to further edit these special modprobe.conf(.lustre*) files within the unpacked

image beneath /var/lib/systemimager/images/base_image/etc and then re-create the

base_image.tgz tarball manually afterwards.

sfsmgr disable server N force=yes # Disable the dead server (N=1=admin, N=2=mds)

sfsmgr show server # Make sure the new server state is recorded

# Now return the machine to its normal state

service mysqld stop # Stop the system database

umount /var/hpls # Unmount the admin LUN

service cluster start # Start the cluster service (which indirectly starts the

admin service).

chkconfig cluster on # Make sure the cluster service restarts again.

# At this stage, the system should work normally, no reboot is needed.