Updating Drivers Inside Images#
The ICE ClusterWare™ platform uses images to provision compute nodes. Because of this, any drivers, applications, or libraries required to run the compute node hardware or jobs need to be available to the running compute node, not the head node(s). To assure availability, software needs to be installed into the image or onto some form of cluster shared storage. Drivers are more commonly installed into the image while applications and libraries are installed to shared storage and accessed through the module command.
When installing software into an image there are two approaches available.
The most common is installing into the image via the scyld-modimg
command, commonly
via the --chroot
option. In rare cases, some software can only be installed on a
running node. In these cases, the image be captured using the scyld-modimg --capture
command.
For example, to install a package called prod-install.sh
within an image named Prod202404
using
the chroot method, run:
scyld-modimg -iProd202404 --copyin prod-install.sh /root --chroot --upload --overwrite
The tool unpacks the image into a local workspace directory within your
home and chroot into it after bind mounting necessary system paths. Once inside the chroot,
the prod-install.sh
file is copied into /root
and you can complete
the necessary steps to install the software.
Some types of software try to build kernel modules for the currently running kernel.
Within a scyld-modimg --chroot
, that may be incorrect because the current kernel is actually the
host kernel and may not match the kernel running on the booted compute node. Most installers
provide some command line option to allow you to specify the target
kernel, but for installers that do not, the kernel version can be specified immediately
after the --chroot
argument:
scyld-modimg -iProd202404 --copyin prod-install.sh /root --chroot <KVER> --upload --overwrite
Specifying the kernel version causes the ClusterWare software to replace the uname
command inside
the chroot with a wrapper that outputs the specified kernel version in place of the
one detected by the actual uname
command. This is usually adequate to trick even
stubborn installers into using the correct kernel. In the rare case of an installer
that still fails, ssh into a running node, install the
software there, and then capture the file system to a new image via:
scyld-modimg --capture <NODE> --set-name <IMAGE> --chroot --upload
This command uses ssh to connect to the running node and run scripts on the node. These will copy the contents of the local file systems, unpack them into a local directory, and then chroot into that directory. Within that chroot, you can make further changes before the captured image is uploaded. Note that capturing a running node does run the risk of capturing node-specific details, so installing software within the chroot is preferable.