System information

7 Automatic Cluster Management
7.1.5 Automatic Operating System Installation
After the CHARM cards have tested the nodes, the operating system of the node has to
be installed. The installation of the cluster nodes is also done automatically. The software
tool SystemImager is used to install the nodes [95]. The main issue of the SystemImager
is that it clones a Linux system [96]. Thereby, a master node, called golden client contains
the prototype of the Linux system.
NFS-Server Image-Server
Golden Client
(Master Node)
clone system
provide
autoinstall disc
provide
autoinstall disc
download
system image
CHARM Card
Cluster Node
USB
upload
autoinstall disc
1
2
3
4
5
Figure 7.2: Functional overview of the system installation of the HLT cluster nodes. The
red circles define the process order of the system installation.
The SystemImager makes an image of the golden client and saves it to a dedicated server
node. It also generates a bootable floppy image which has to run on the unconfigured client
nodes. The floppy image contains a small live system which download the master image
from the image server and installs it to the local system. Figure 7.2 illustrates the process
of the cluster OS installation. The CHARM cards provide the floppy image to the client
nodes with the aid of the emulated USB CD-ROM device. The ISO file of the floppy image
is stored on an NFS server, because the CHARM does not have enough memory to store
the image. After power up of the client node, the system loads the boot disc which starts
the download of the system image from the image server. However, the client node can
also be completely installed from the network. The SystemImager framework provides a
boot server for such a case. The BIOS of the nodes has to be configured to boot from the
network. The CHARM can adjust the BIOS settings of the cluster nodes to boot from
the network. In this case, the auto install disc is loaded from the network instead of the
emulated USB CD-ROM device.
7.1.6 Automatic Repair
In general, boot failures like wrong CMOS settings or a failed boot-loader configuration are
normally not the cause of a serious hardware failure. Therefore, this kind of errors can be
easily fixed, without a time consuming search of the error source. As a rule, the cause of
the boot failure is printed out to the screen, like "CMOS checksum error" or "Missing boot
device". The CHARM card can detect and correct such kind of errors automatically. The
100