HP XC System Software Release Notes for Version 3.0
To verify the change, submit an interactive job similar to the following:
[lsfadmin@n16 ~]$hostname
n16
[lsfadmin@n16 ~]$ bsub -Is -n8 /bin/bash -i
Job <261> is submitted to the default queue <interactive>.
<<Waiting for dispatch ...>>
<<Starting on lsf.localdomain>>
[lsfadmin@n4 ~]$ hostname
n4
[lsfadmin@n4 ~]$ srun hostname
n4
n4
n4
n4
n5
n5
n5
n5
[lsfadmin@n4 ~]$ exit
exit
[lsfadmin@n16 ~]$ hostname
n16
[lsfadmin@n16 ~]$
9.1.2 How To Switch the Type of LSF Installed
The HP XC system installation process offers a choice of two different types of LSF. The default
choice is LSF-HPC with SLURM. This choice requires that SLURM is installed and configured
when you run the cluster_config utility. Standard LSF is the second type of LSF that is
available to install, and it does not interact with SLURM.
If you made the wrong LSF selection while running the cluster_config utility, perform the
following procedure to remove the current type of LSF installed and install the other type of
LSF:
1. As root on the head node, rerun the cluster_config utility. Proceed through the process
until you reach the LSF section.
2. When you are prompted to configure LSF, enter yes.
3. When prompted, select the type of LSF you want to install. Standard LSF is choice 1, and
LSF-HPC with SLURM is choice 2 (choice 2 is the default).
4. When prompted, enter d to delete the existing LSF installation.
5. Answer the remainder of the questions as appropriate for your system. The
cluster_config updates the golden image.
6. Propagate the new golden image to all nodes.
9.1.3 Node Reboot May Result in Inconclusive Job Termination
If a node that is running a job under LSF-HPC with SLURM is rebooted (with the reboot
command), SLURM may recognize the node as unresponsive and attempt to terminate the job.
However, some remnants of the job may remain, which will cause LSF to report the job as still
running.
This issue has been seen with large jobs using in excess of 100 nodes.
If the node power is turned off instead of rebooted, however, LSF-HPC with SLURM reports the
status as EXIT, and the node is released back to the pool of idle nodes.
52 Load Sharing Facility and Job Management Notes