HP StorageWorks Scalable File Share Release Notes - Version 2.3

New and changed features in HP SFS Version 2.31–8
1.4.3 NFS clients kernel panic trying to create files larger than physical mem size
This is a known Red Hat Enterprise Linux 4 bug #163555. It is more likely to occur with Lustre filesystems, as
Lustre is often used to store huge files.
See https://bugzilla.redhat.com/show_bug.cgi?id=163555
for details.
Workaround to prevent the bug: Run the following command on the client:
echo 100 > /proc/sys/vm/lower_zone_protection
1.4.4 Start filesystem will hang on badly initialized InfiniBand fabrics
This is likely to happen on new systems with large InfiniBand fabrics, where the IB switch has been incorrectly
initialized. The sfsmgr start filesystem will hang for a long time and eventually report errors,
despite the fact that all system configuration commands apparently succeeded, and ibstat succeeds on every
node.
To avoid such problems, HP strongly recommends that you run the sfsmgr syscheck command before
starting a filesystem for the first time. syscheck will quickly verify that every node can communicate with
every other node over all interconnects, and report an error if not.
1.4.5 Propagating contents of /etc/modprobe.conf(.lustre*) into the XC
systemimage
During the installation and configuration of the SFS client RPMs on an XC headnode, edits may have been
made to the /etc/modprobe.conf(.lustre*) files. These edits must be propagated into the
systemimage so that the remaining XC nodes can also benefit during imaging.
On an XC cluster, files such as /etc/modprobe.conf(.lustre*), and /etc/sfstab are only
populated into the systemimage during the "inaugural" golden image creation. Subsequent "updates" of
the golden image will not re-populate certain special files with new content.
Diff'ing the two systemimager exclusion files on the XC headnode produces an understanding of which files
are excluded from the inaugural/base image versus the "extra" set of files that are excluded from subsequent
updates to that image:
/opt/hptc/systemimager/etc/base_exclude_file
/opt/hptc/systemimager/etc/updgi_exclude_file
If you have already made manual edits to your systemimage (for example, elilo aspects, and so on), then
you can choose to further edit these special modprobe.conf(.lustre*) files within the unpacked
image beneath /var/lib/systemimager/images/base_image/etc and then re-create the
base_image.tgz tarball manually afterwards.
sfsmgr disable server N force=yes # Disable the dead server (N=1=admin, N=2=mds)
sfsmgr show server # Make sure the new server state is recorded
# Now return the machine to its normal state
service mysqld stop # Stop the system database
umount /var/hpls # Unmount the admin LUN
service cluster start # Start the cluster service (which indirectly starts the
admin service).
chkconfig cluster on # Make sure the cluster service restarts again.
# At this stage, the system should work normally, no reboot is needed.