Update the ICE ClusterWare Software#

When updating the ICE ClusterWare ™ software, you need to complete prerequisites according to your current cluster version and configuration, complete update steps on each head node, and then update any optional packages.

Software Update Prerequisites#

Before you begin updating or upgrading the ClusterWare software, review the following prerequisites.

  1. When updating a production cluster, first test the update process on a development cluster, if one is available. Be prepared to roll back to your previous version if issues arise.

  2. Take a snapshot of your head node virtual machines before beginning the upgrade so you can roll back if unforseen issues arise. When creating the backup, it is recommended to include the date in the file name to distinguish it from other backup files.

    Note

    The binaries included in a database backup changed between ClusterWare versions 11 and later major versions. For ClusterWare version 11, omitting the --without-all argument includes the actual ISO, kernel, initramfs, and root file system binaries and results in a much larger file. Similarly, adding the --with-all argument does the same in ClusterWare version 12 and later.

    • To make a database backup in ClusterWare version 11, run:

      /opt/scyld/clusterware/bin/managedb save --without-all backup-<date>.zip
      
    • To make a database backup in ClusterWare version 12 or later, run:

      /opt/scyld/clusterware/bin/managedb save backup-<date>.zip
      
  3. Have a copy of your cluster ID available to use when downloading the RPMs or ISO. The cluster ID is a ~12 character hex string that is typically embedded in the baseurl= line of the /etc/yum.repos.d/clusterware.repo on your head nodes. If you cannot find your cluster ID, contact Penguin Computing.

  4. If you have a shared storage space, make a backup of the data.

  5. If your head nodes do not have Internet access, the required repositories must reside on local storage accessible by the head nodes. See Creating Local Repositories without Internet for details.

  6. If you are upgrading from ClusterWare version 12 to version 13 and installing from the public repository, the repository location changed from updates.penguincomputing.com to repo.ice.penguinsolutions.com. Contact Penguin Computing for a new repo file.

  7. If you are upgrading from ClusterWare version 11 to version 12 or later, you must switch your database from couchbase to etcd. Contact Penguin Computing for assistance.

Update ClusterWare Software#

Use the cw-install tool with the --update argument to update the ClusterWare software on the head node(s).

Important

The cw-install tool disables /etc/yum.repos.d/clusterware.repo to prevent yum update from inadvertently updating the ClusterWare software. A simple yum update does not update ClusterWare packages on a head node.

To update the ClusterWare software:

  1. Complete all applicable steps in Software Update Prerequisites.

  2. Confirm that your DNF configuration is set up properly by running:

    dnf check-update
    

    The packages listed for update with the DNF check should not include any ClusterWare packages. Do not proceed until you fix the repository files and confirm that the operating system (OS) repositories are reachable.

  3. If you are installing from an ISO, download the appropriate ISO to your local system. You will need your cluster ID to download the ISO.

  4. (Optional) Create a database backup from a head node. If you were able to create a snapshot of your head node virtual machine during the prerequisite steps, this is not strictly necessary, but is still recommended.

    1. Create a database backup:

      For ClusterWare version 11:

      /opt/scyld/clusterware/bin/managedb save --without-all backup-<date>.zip
      

      For ClusterWare version 12 or later:

      /opt/scyld/clusterware/bin/managedb save backup-<date>.zip
      
    2. Copy the backup to shared storage.

  5. Confirm your cluster is operational before beginning the update process.

    1. Run the following command on at least one head node:

      cw-nodectl status
      
    2. Run the following command on all head nodes:

      /opt/scyld/clusterware/bin/managedb --heads
      

      The output from each head node should include the same information, but the order may differ.

      If the head nodes are not using the IP ranges you expect, see Managing Databases to resolve the issue.

  6. If you are updating from ClusterWare 12 to ClusterWare 13 and installing from a downloaded ISO, mount the ISO before updating the head nodes:

    mount -o loop clusterware-13.*.iso /mnt
    
  7. If you are updating from ClusterWare 12 to ClusterWare 13 and installing from the public repository, update your repo and installer before updating the head nodes:

    1. Obtain a new clusterware.repo file from Penguin Computing.

    2. Update the ClusterWare installer:

      yum update clusterware-installer
      
  8. Update the first head node.

    1. Run one of the following commands to update the ClusterWare software.

      To install from the repository:

      cw-install --update
      

      To install from a downloaded ISO file (for upgrade from ClusterWare 12 to 13):

      /mnt/cw-install --update
      

      To install from a downloaded ISO file (for upgrade to ClusterWare 12):

      cw-install --update --iso <path to downloaded iso>
      

      Each RPM should upgrade cleanly to the selected ClusterWare version. Toward the end of the update process, the command automatically uploads the ISO file. Update logs are saved to ~/.scyldcw/logs/install*.log.

    2. Resolve any file conflicts listed at the end of the update command.

    3. Verify that the update succeeded.

      1. Run the following command to match the ISO size and sha1sum:

        cw-clusterctl repos -i scyldiso ls -l
        
      2. Run the following command to verify the output matches what you saw previously:

        cw-nodectl status
        
      3. Run the following command to confirm that all packages show the new version:

        rpm -qa clusterware\*`
        
  9. Complete the same update steps on all remaining head nodes.

  10. If you are updating from ClusterWare version 12.0.1 or earlier to ClusterWare version 12.1.0 or later, reconfigure the InfluxDB/Telegraf monitoring stack. All data will persist through the upgrade. This step is not required for updates from ClusterWare version 12.1.0 and later.

    1. Update the configuration files:

      /opt/scyld/clusterware/bin/influx_grafana_setup --tele-env
      
    2. Restart Telegraf:

      systemctl restart telegraf
      
  11. After all head nodes are updated and validated individually, verify that all head nodes agree on the list of head nodes by running the following on each:

    /opt/scyld/clusterware/bin/managedb --heads
    

    The output from each head node should include the same information, but the order may differ.

  12. Verify the head nodes agree on compute node data by running the following command on all head nodes again:

    cw-nodectl status
    
  13. Determine if any data can be removed from the head nodes by running the following command, reviewing the output, and then adding appropriate arguments:

    cw-clusterctl heads clean
    

    See Remove Unnecessary Objects from ClusterWare Storage for additional information about the clean command.

  14. [Optional] If you are updating from a release prior to ClusterWare version 13.0 to a later version, you generate a set of Grafana users, one for each Grafana role. See Grafana User Accounts for more information about the users.

    Important

    Adding the grafana-users file overwites existing Grafana accounts.

    1. Add the grafana-users file:

      /opt/scyld/clusterware/bin/influx_grafana_setup --grafana-users
      
    2. Restart Grafana:

      systemctl restart grafana-server
      
    3. Verify the Grafana user accounts:

      cat /opt/scyld/clusterware-grafana/lib/grafana-users
      
  15. [Optional] Shut down all head node virtual machines, take new snapshots, and boot all head nodes. While technically optional, this step is highly recommended to maintain a known good working state in case you encounter issues in the future.

Update Optional Packages#

The cw-install tool only updates the basic ClusterWare head node software that was previously installed by the tool, plus any other dependency packages.

Note

Slurm version 23.02.7 cannot be directly updated to Slurm version 25.05.4 while preserving the existing Slurm accounting database.

After the ClusterWare software is updated, you can update the optional packages:

  1. View the previously installed optional packages:

    yum check-update --enablerepo=cw* --enablerepo=scyld* | grep scyld
    
  2. Update the packages as appropriate for your head node(s):

    sudo yum update --enablerepo=cw* --enablerepo=scyld* <PACKAGES>
    
  3. View the non-ClusterWare installed packages that have available updates:

    yum check-update
    
  4. Update the packages as appropriate for your head node(s):

    sudo yum update <PACKAGES>
    

Update Compute Nodes#

A compute node can be dynamically updated using a simple yum update, which uses the local /etc/yum.repos.d/*repo file(s). If the compute node is executing a ClusterWare-created image, then these changes (and any other changes) can be made persistent across reboots by using cw-modimg and running yum update inside the chroot. See Modifying Images for details.

Update ClusterWare 11 to Later Major Versions#

Important

A cluster using the ClusterWare Couchbase database must first switch that database to etcd. Contact Penguin Computing for assistance.

ClusterWare version 11 updates cleanly to version 12 or later. Some additional steps are recommended after updating and testing all head nodes as described in Update the ICE ClusterWare Software.

  1. Examine /etc/yum.repos.d/clusterware.repo and potentially edit that file to reference the new ClusterWare version repos. For example, if the baseurl= contains the string clusterware/11/, then change 11 to 12 or 13. If the gpgkey contains RPM-GPG-KEY-PenguinComputing, then change PenguinComputing to scyld-clusterware.

  2. Boot configurations and images built by version 11 are retained after upgrade. Compute nodes from ClusterWare version 11 are technically compatible with ClusterWare version 12 and later parent head nodes. However, to make use of the full additional functionality of later releases, update existing ClusterWare version 11 images with at least the lateset version of clusterware-node. See Update Compute Nodes for details.