Updating Drivers Inside Images#

The ICE ClusterWare™ platform uses images to provision compute nodes. Because of this, any drivers, applications, or libraries required to run the compute node hardware or jobs need to be available to the running compute node, not the head node(s). To assure availability, software needs to be installed into the image or onto some form of cluster shared storage. Drivers are more commonly installed into the image while applications and libraries are installed to shared storage and accessed through the module command.

When installing software into an image there are two approaches available. The most common is installing into the image via the scyld-modimg command, commonly via the --chroot option. In rare cases, some software can only be installed on a running node. In these cases, the image be captured using the scyld-modimg --capture command.

For example, to install a package called prod-install.sh within an image named Prod202404 using the chroot method, run:

scyld-modimg -iProd202404 --copyin prod-install.sh /root --chroot --upload --overwrite

The tool unpacks the image into a local workspace directory within your home and chroot into it after bind mounting necessary system paths. Once inside the chroot, the prod-install.sh file is copied into /root and you can complete the necessary steps to install the software.

Some types of software try to build kernel modules for the currently running kernel. Within a scyld-modimg --chroot, that may be incorrect because the current kernel is actually the host kernel and may not match the kernel running on the booted compute node. Most installers provide some command line option to allow you to specify the target kernel, but for installers that do not, the kernel version can be specified immediately after the --chroot argument:

scyld-modimg -iProd202404 --copyin prod-install.sh /root --chroot <KVER> --upload --overwrite

Specifying the kernel version causes the ClusterWare software to replace the uname command inside the chroot with a wrapper that outputs the specified kernel version in place of the one detected by the actual uname command. This is usually adequate to trick even stubborn installers into using the correct kernel. In the rare case of an installer that still fails, ssh into a running node, install the software there, and then capture the file system to a new image via:

scyld-modimg --capture <NODE> --set-name <IMAGE> --chroot --upload

This command uses ssh to connect to the running node and run scripts on the node. These will copy the contents of the local file systems, unpack them into a local directory, and then chroot into that directory. Within that chroot, you can make further changes before the captured image is uploaded. Note that capturing a running node does run the risk of capturing node-specific details, so installing software within the chroot is preferable.