Head Nodes Page#

The Head Nodes page is available via the ICE ClusterWare ™ left navigation panel under Cluster > Head Nodes or via the ClusterWare Disk Usage panel on the Cluster Overview Page by clicking the Manage Head Nodes link.

Heads Page

The Head Nodes page provides basic details for all head nodes, including current timestamp, clock, load averages, and ClusterWare version. In a cluster with multiple head nodes, comparing the values can uncover issues. For example, if the head nodes show different ClusterWare versions after an upgrade, it could indicate that an upgrade didn't succeed on all head nodes. The clock is useful for tracking SSL certificate and Slurm security certificate expiration. If the time differs between head nodes, head-to-head communication may be blocked.

The head node values can be copied by clicking the Copy icon next to each string. Copying the values can help when reporting an issue to Penguin Computing support.

You can expand the ClusterWare Services panel to view additional head node status details, such as whether the Telegraf and InfluxDB monitoring services are enabled and active. Use these status values to see if any services are down or misconfigured.

Review Head Node Status and Disk Usage#

Selecting a head node name displays additional details for that head node, like current status and the public key that the head node uses to connect to compute nodes. Each head node has a chart showing disk usage for various ClusterWare data: ISOs, images, boot configurations, Git repositories, InfluxDB data, the etcd database, and Other. These are the proportional disk usages for ClusterWare components and not for the head node as a whole.

The chart for each head node in a cluster with multiple head nodes will likely display small differences in the sizes of the components. Hover over specific components to see their absolute sizes. The replicated content (ISOs, images, and boot configurations) show identical sizes across the head nodes.

Remove Extra Files#

The "Other" category of disk usage consists of files in the ClusterWare storage directory that are not recognized by the system. These are usually files left behind during partial uploads, interrupted image cloning, or other failure cases. If the cluster is working as expected, you can remove these files by clicking the More menu More Icon at the top of the head node panel and selecting Clean storage/. This action is equivalent to executing the following command:

cw-clusterctl heads -i <HEADNODE> clean --files

Delete Head Node#

You can temporarily delete the head node data from the ClusterWare etcd database. By default, a head node writes status to the database entry every 10 seconds. If there is a problem writing to the database, the interval increases gradually to a maximum of 10 minutes. Deleting the head node database entry does not remove the head node from the ClusterWare software. Instead, a new entry is created the next time the head node sends status information to the database.

Deleting the database entry is useful when a head node is down and not rebooting. It can also be useful if you reinstall a head node and the ClusterWare software is showing multiple entries for the same head node. For example, a head node in a three head node cluster may be having problems, so you reinstall the head node, which generates a new unique identifier (UID). When the head node is back online, the ClusterWare GUI shows four head nodes - an entry for both the old and new UID of the rebooted head node. In this case, you can delete the old entry from the database.

To delete the head node etcd database entry:

  1. Click the head node name to open the details for that head node.

  2. Click the More menu More Icon and select the Delete action.

    The head node details are temporarily removed from the ClusterWare GUI. The head node reappears the next time it writes to the database.

This action is equivalent to executing the following command:

cw-clusterctl heads -i <HEADNODE> delete