Installation guide
Release Notes
Issues with TORQUE version 2.5.3
TORQUE version 2.5.3, which was introduced in Scyld Release 482g0000 and withdrawn in Scyld Release 482g0005, has
seemingly introduced various problems running TORQUE jobs. If your cluster has experienced new TORQUE problems
that you believe appeared after an upgrade to version 2.5.3, then we suggest reverting to an earlier version:
rpm -qa | grep torque-2.5.3 | xargs rpm -e --nodeps
yum install torque
will install the preferred TORQUE version.
OFED 1.2 vs. OFED 1.3 Issues
Updating from RHEL4 Update 6 (or CentOS 4.6) to Update 7 also updates the Open Fabric Infiniband (OFED) software
stack from version 1.2 to 1.3. Updating to Update 8 updates to OFED version 1.4. Scyld ClusterWare itself is compatible
with either OFED 1.2, 1.3, or 1.4, but OFED 1.3 and 1.4 may be incompatible with MPI stacks supplied by certain ISV
applications.
For example, OFED 1.2 includes DAPL 1.0, with configurations found in /etc/ofed/dat64.conf and dat32.conf.
OFED 1.3 includes DAPL 2.0, with configurations found in /etc/ofed/dat.conf, and thus some DAPL applications
will fail to find their intended DAPL libraries. One solution might be to reconfigure a DAPL application to use the IBV
(OpenIB) transport instead of the DAPL transport. However, not all MPI stacks support the IBV transport.
No OFED 1.3 problems have been observed for applications based upon MVAPICH or OpenMPI.
You may want to continue to use OFED 1.2 and to avoid an upgrade to OFED 1.3 or 1.4. This may be accomplished by doing
an update to the base distribution and excluding the Infiniband-related rpms, thus retaining whatever Infiniband-related rpms
(presumably OFED 1.2) are already installed on the master node. For RHEL4:
up2date -u --exclude={*dapl*,ib*,infiniband*} \
--exclude={libib*,libcxgb3*,libehca*,libmlx4*} \
--exclude={libmthca*,libnes*,librdmacm*,libsdp*} \
--exclude={ofed*,openib*,opensm*,qlvnictools*,qperf*,srp*}
or for CentOS:
yum update --exclude={*dapl*,ib*,infiniband*} \
--exclude={libib*,libcxgb3*,libehca*,libmlx4*} \
--exclude={libmthca*,libnes*,librdmacm*,libsdp*} \
--exclude={ofed*,openib*,opensm*,qlvnictools*,qperf*,srp*}
The full list of Infiniband-related rpms is:
compat-dapl-1.2.5
compat-dapl-devel-1.2.5
compat-dapl-static-1.2.5
dapl
dapl-devel
dapl-static
dapl-utils
ibsim
ibutils
infiniband-diags
libcxgb3
libcxgb3-devel
18