HP XC System Software Release Notes for Version 3.1

15.3.4 Moving SLURM and LSF to Their Backup Nodes

This procedure is not documented in the HP XC System Software Administration Guide but it will

be included in a future version.

To move the SLURM and LSF daemons from their primary node to their backup node (perhaps

due to a maintenance need on the primary node), follow this procedure:

1. Log into the backup node as root.

2. Shut down the backup slurmctld daemon:

# pkill slurmctld

3. Use the text editor of your choice to edit the /hptc_cluster/slurm/etc/slurm.conf

file. Change the value of the ControlMachine attribute to the backup node, and comment

out or change the value of the BackupController.

4. Save your changes and exit the text editor.

5. Shut down LSF on the primary node (you do not have to be logged into the primary node

to do this).

Shutting down LSF on the primary node will not impact batch jobs, but it will terminate

interactive LSF jobs (jobs submitted with the bsub -I option). Therefore, take the appropriate

precautions before running this command (either warn users, or close the LSF queues and

wait for all jobs to finish).

# controllsf stop

6. Log in to the primary SLURM node and shut down the primary slurmctld daemon:

# pkill slurmctld

7. On the backup SLURM node, start the primary slurmctld controller:

# slurmctld

8. Start LSF locally on the backup node:

# controllsf start here

9. Enter the following command if you want to make the backup node the primary node for

LSF:

# controllsf set primary backup_nodename

If you set another node to be the BackupController for SLURM, you can log into that node

and run the slurmctld command. This new backup node requires the resource_management

role to be assigned to it for this configuration to persist after future runs of the cluster_config

command .

To move LSF and SLURM back to the original primary node, follow the same procedure with

the assumption that the original primary node is now the backup node, and the original backup

node is now the primary node.

15.4 Manpages

discover.8

The following description of the --nowarn option is missing from the manpage:

This option is used in conjunction with the --replacenode option to instruct

the discover command not to issue the warning to have the node powered

down and set to network boot.

15.4 Manpages 81