Changelog#

See Release Notes for summary information about the latest ICE ClusterWare ™ release. This section contains a more detailed history of all recent releases.

13.0.0-g0000 - December 2, 2025#

  • Transition public installation repository to repo.ice.penguinsolutions.com. Both the ClusterWare 12 and ClusterWare 13 installation files are now available in the new repository. Contact Penguin Computing for details.

  • Update commands to use cw-* prefix. Scripts that use the older scyld-* commands will continue to work, but should be updated with the new syntax as support for legacy scyld-* commands will be removed in a future release.

  • Head nodes now require Python 3.12 or later.

  • New Syslog and Audit Log viewers in Grafana. Syslog forwarding is enabled by default for all head nodes. Audit Log collection is managed through a Telegraf plugin.

  • The ClusterWare graphical user interface (GUI) is now at near-parity with the command line (CLI) tools. All major actions on any database primitive can be done through either the GUI or CLI.

  • Work around new umount behavior found in recent Ubuntu versions.

  • Update Ubuntu initramfs to include the bash dracut module.

  • Include mdamd in Ubuntu images.

  • Fix grub installation when deploying Ubuntu.

  • Remove some untested supported distros from the documentation.

  • Support setting node attributes during node creation with a.name=value.

  • Implement tenancy-aware InfiniBand partitioning.

  • Add InfiniBand networks (ibnets) to configure OpenSM partitioning based on tenancy configuration.

  • Upgrade Telegraf to version 1.35.3.

  • Create a separate clusterware-telegraf-relay package.

  • Initial pass at relaying both Telegraf and ClusterWare status out of tenancies.

  • Batch relayed Telegraf messages to improve throughput.

  • Add an example Butane script with mdraid mirroring.

  • Print clearer error messages when the backend is not responding.

  • Use UUID= lines in the fstab lines generated during deployment.

  • Fix reading and writing the Cluster ID in the clusterware.repo file.

  • Better document the head auto eject and rejoin process.

  • Properly refresh access tokens when they are expiring.

  • Update user token duration argument name from timeout or lifetime to lifespan. The legacy arguments will continue to work, but will be removed in a future release.

  • Improvements to token creation and exchange.

  • Remove the --raw option from command line tools.

  • Replace the Pyramid backend with FastAPI.

  • FastAPI clusterware service does not support the reload action for now.

  • Bump our internal Git version to 2.48.1 and resolve some permission issues.

  • Automatically create less privileged Grafana users during installation.

  • Rearrange and rename log files in /var/log/clusterware moving httpd-specific logging into httpd_error_log and httpd_access_log.

  • The clusterware service is no longer an alias for httpd but instead a separate service proxied behind httpd.

  • Add a reasonable default RequestReadTimeout httpd configuration.

  • Add a _hypervisor reserved attribute for an admin to explicitly record where a virtual machine is running.

  • New _managed_switch reserved attribute is used to mark what switches are managed in a multi-tenant cluster.

  • New relay:// power plugin to send power requests to a separate cluster in multi-tenant cluster configuration.

  • Add a new node script (lldp_neighbors) to set a new reserved attribute _neighbors.

  • The wipe-disks.sh command now zeroes the first gigabyte of each disk.

  • By default system logs are collected on head nodes and stored in Influx.

  • Build the mt-gateway image on each cluster based on Alpine Linux.

  • Speed up the intial dhcp log parsing with a cache.

  • Improve PREFER_KMOD handling in mkramfs.conf.

  • Fix /etc/hosts handling in cw-modimg --chroot that resulted in an empty hosts file.

  • Increase session reuse in client tools for better performance.

  • Support tenancy creation using carefully formed Terraform files.

  • Include our own Terraform binary to handle incoming files.

  • Require jq on head nodes and in images.

  • Small improvements to the Nvidia hardware data collection in clusterware-node.

  • Better handling for when the etcd database becomes too large.

  • Implement a Kubevirt provider to allocate virtual machines on Kubernetes.

  • Package our own builds of kubectl and virtctl for the Kubevirt provider.

  • Eliminate an infinite loop due to duplicate clientids.

  • Initial support for importing qcow2 files as images.

  • Collect InfiniBand port GUIDs as hex in node hardware.

  • Deprecate authctl for configctl.

  • Document how to authenticate using ssh keys on the /login endpoint.

  • Report clearer error messages for Keycloak failures.

  • Improve documentation for air-gapped installations.

  • Add an article about node selectors.

  • Fix repo, distro, and gitrepo cloning.

  • Collect more status information from switches with clusterware-node installed.

  • Several fixes to the virsh power plugin.

  • Use Slurm dynamic node configuration by default but provide a _slurmd reserved attribute to disable this behavior.

  • Update Slurm, OpenMPI, and other middleware packages.

  • Remove el7 documentation references.

  • Retire support for PBS Torque.

  • Quiet some repetitive logging.

  • Technology preview of a new audit system.

  • Set the audit level to debug for the extremely frequent putstatus and daemon process calls.

  • Leverage the etcd serializable consistency mode for faster putstatus calls.

  • Batch putstatus operations for improved response times.

  • Many improvements to the Keycloak interface and plugin.

  • Add jitter to status updates to reduce contention.

  • Improve SELinux dependency handling at load time.

  • Improve handling for freezing and unfreezing objects.

  • Make loading node and head information optional in managedb.

  • Better document some _ansible_pull details.

  • Add a new cw-modimg --deploy option to run Ansible scripts inside an image.

  • Implement a new reserved attribute _ansible_retries to change _ansible_pull behavior.

  • Disable cw-modimg automatic locking for later improvements.

  • Fix cw-nss handling of longer names.

  • Speed up cw-sysinfo data collection.

  • Introduced new type of switch: SONIC.

  • Introduced new JSON configuration for SONIC switches.

  • Enabled JSON configuration of the following on SONIC:

    • Administrative state (shutdown/no shutdown) of interfaces

    • Routed Interfaces with static IPv4/IPv6 addresses

    • Switched VLANs with static IPv4/IPv6 addresses

    • BGP instance and instance configuration

    • BGP peers, peer-groups, dynamic neighbors, and BGP Unnumbered neighbors

    • BGP v4/v6 routing

    • BGP ECMP

    • BGP EVPN address family

    • VXLAN VTEP configuration and VNI/VLAN termination information

  • Additional GUI improvements include:

    • Update look and feel in the GUI, including a redesigned login screen, marking required fields with an asterisk, and adjusting icons for easier navigation.

    • Add drag-and-drop functionality for nodes from the GUI grid or list view to create per-node details tabs.

    • Separate hostname records into 2 sections, A-records and SRV-records, to reduce confusion.

    • Keycloak settings can be managed and changed through the GUI.

    • Dynamic Groups have a Test button in the GUI that can be used to view a list of nodes that match the selector criteria.

    • All panels now open in "view" (read-only) mode, to avoid accidental modifications to database objects. Users can switch to "edit" mode to make changes.

    • Boot Configs and Distros now have "full delete" functionality which deletes both the object and any "sub-objects" related to it (full-delete on a Boot Config deletes the related Image; full-delete on a Distros deletes all related Repos).

    • Improve error messages in all GUI pages.

12.4.2-g0000 - March 17, 2025#

  • Revert a bare metal boot issue due to a conflict with upcoming Sonic ZTP functionality.

  • Fix an SELinux issue that interferes with Telegraf on upgrade.

12.4.1-g0000 - March 7, 2025#

  • Continuing improvements to the ClusterWare graphical user interface (GUI), including more consistent display of labeled fields and components.

  • Updates to dashboards, including a new GPU dashboard.

  • SELinux/MLS improvements.

  • Rebuild the initramfs when deploying grub to a local drive for a persistent installation.

  • Implement a new _wipe_all attribute that will destroy all local data on a booting system.

  • Move to Git version 2.48.1 to patch known vulnerabilities.

  • Bug fixes and documentation updates.

12.4.0-g0000 - February 3, 2025#

  • The product is rebranded from Scyld ClusterWare to ICE ClusterWare. Initial changes are reflected in the product GUI and in the documentation. Future releases will introduce additional branding updates, including updates to the command line tools.

  • Implement the first providers plugin, specifically supporting hypervisors running libvirt using virsh and virt-install commands.

  • Include a couple of example deploy scripts in the /opt/scyld/clusterware-tools/examples/deploy directory.

  • Reduce repetitive logging.

  • Implement a new _altmacs reserved attribute that passes alternative MAC addresses for a node to the DHCP server. This attribute may be replaced by a more robust solution in future releases.

  • Significant simplifications and improvements to the scyld-kube tool used for deploying Kubernetes.

  • Mark nodes as "busy" if virsh list shows running virtual machines on the node.

  • A new scyld-modimg --deploy argument allows administrators to execute an Ansible playbook against an image or combine the copy and execute steps for running a shell script inside the image.

  • The scyld-modimg command now accepts a --progress argument to either not print remaining time or to print dots instead of detailed progress.

  • Propagate errors from the debootstrap tool out to the user to simplify Ubuntu image creation debugging.

  • Prevent users with the NoAccess role from even logging in and prevent tmpadmins from minting tokens.

  • Improve parsing of scyld-modimg --run scripts and document the functionality.

  • Add --discard-on-error option to scyld-modimg to facilitate scripting and automation.

  • The scyld-clusterctl nets tool allows admins to define additional networks where ClusterWare nodes may be connected.

  • Improved ClusterWare graphical user interface (GUI) information architecture to help new users navigate the product.

  • Each primitive now presents a set of labeled fields and components within the ClusterWare GUI that are customized to that primitive.

  • Updated ClusterWare GUI colors and logos to match the new product branding.

  • Make ipmitool and rasdaemon weak dependencies of clusterware-node.

  • Implement a new _aim_status reserved attribute and add support in scyld-nodectl status to show status based on that attribute. Contact Penguin Computing to learn more.

  • Rearrange the build system to better isolate Pyramid code.

  • Move image exports from the image to the head that does the export.

  • Replace libvirt power plugin with a version that calls virsh.

  • Remove the deprecated socket-based waitfor code.

  • Add stricter versioned dependencies between some packages.

  • Ensure scyld-clusterctl hosts entries are pushed to scyld-nss.

  • Remove more references to el7 and remove development packages required by el7 builds.

  • Keep the dnsmasq service up during clusterware service restarts.

  • Allow the mosquitto service to start even with missing certificates.

  • Add image locking during modification to prevent administrators from accidentally overwriting each other's changes.

  • Improve scyld-modimg to make conflicts between different instances less likely.

  • Implement shared-key encryption for communications between head nodes using stunnel.

  • Improve our parsing of ip output.

  • Add documentation about communication encryption.

  • Switch telegraf from UDP to HTTP(S) with a new relay service, significantly reducing telemetry gaps.

  • Improved method for deploying client packages to switches.

  • Document how to change the etcd password and create a script to recover if the etcd passwords is lost. Contact Penguin Computing for assistance with the script.

  • Improve the slurm and kubernetes installation scripts.

  • Include the API Reference as a part of our standard documentation.

  • Add a missing dependency required to build newer Ubuntu images.

  • Update the supported distros table to include el8.10 and el9.5.

  • Update documentation information architecture and HTML site design to improve user experience.

  • Assorted other bug fixes and documentation updates.

12.3.0-g0000 - October 4, 2024#

  • Reduce polling in scyld-nodectl status --refresh, but leverage the waitfor framework and MQTT.

  • Switch to a Unix socket to communicate between the ClusterWare backend and etcd to enable updating gRPC.

  • Add a new _bootnet attribute for customizing the name of the bootnet interface.

  • Support --selector to select nodes in slurm-scyld.setup.

  • Introduce an improved clusterware-node deployment mechanism for SONiC switches.

  • Make compute node code scripting less likely to produce a bad parent-head-node line in /etc/hosts.

  • Support creating tmpfs subdirectories in ignition for diskless STIG'd systems.

  • Cleaner handling of the client.sslverify setting.

  • Reduce the head node minimum memory check after removal of Couchbase.

  • Restrict access to the GUI to only accept secure remote connections.

  • Bump the version numbers for most Python dependencies.

  • Correct "frozen" image handling during import and refuse to delete frozen images.

  • Remove deprecated code, including code specific to el7 head nodes.

  • Add functionality for Telegraf to collect ClusterWare node attributes.

  • Change technique for converting node lists into ranges when reporting status.

  • Tighten some directory permissions.

  • Correct the _ipxe_sanboot creation during bootload installation.

  • Fix a scyld-bootctl export failure that previously required a patch.

  • Provide a mechanism for setting a realtime IO priority on etcd.

  • Make it more difficult to modify a cached version of an image unintentionally.

  • Improve gitrepo backend handling to avoid common failures.

  • Stop creating .old.XX files when modifying objects in multi-head clusters.

  • Avoid the MOTD interfering with scyld-nodectl scp.

  • Small fixes to boot chaining failure handling.

  • Wider use of the cluster certificate authority to securing communications.

  • Fixes for netplan configurations in Ubuntu images.

  • Restart Telegraf when moving between head nodes.

  • KeyCloak integration improvements.

  • Assorted other bug fixes and documentation updates.

12.2.0-g0000 - July 26, 2024#

  • Improve Grafana column scaling.

  • Quiet a warning about TripleDES by removing it as an option from paramiko.

  • Support _boot_style=iscsi on el8 and el9 systems.

  • Update CentOS 7 and CentOS Stream 8 URLs to use vault.centos.org since el7 is now also EOL.

  • Improve DNS resolution of head nodes with multiple IPs using localise-queries in the dnsmasq.conf.template but also include a leases.register_heads boolean to disable entire feature.

  • Write NetworkManager connection files on el9 systems and improve netplan configuration file writing on Ubuntu.

  • Initial Redfish support including an aggregation daemon with more changes and documentation coming later.

  • Provide a mechanism to create a bootable ISO from one or more boot configs.

  • Improve handling of slurm uid and gid syncing when installing packages.

  • Add arguments to scyld-nodectl kexec to allow for one-time-booting using a specific image or boot configuration.

  • Improve the scyld-modimg --capture error handling.

  • Downgrade ansible-core to 2.15.10 to match Python 3.9.

  • Small improvements and cleanups across the GUI.

  • Introduce a new RBAC system for administrators, current scoped cluster-wide. All existing admins will now have the FullAdmin role.

  • Support substitution within the power_uri field.

  • Initial support for deploying Harvester nodes from an ISO.

  • Unhide the existing scyld-clusterctl nets functionality.

  • Include the mosquitto MQTT server to publish system events.

  • Confirm keys added through scyld-adminctl can be loaded with paramiko.

  • Improved Ubuntu image handling in scyld-modimg.

  • Expose the limited but existing scyld-nodectl scp functionality.

  • Improve ZTP handling but still only supporting Cumulus.

  • Improve the unknown nodes tab for unrecongized dhcp clients.

  • Include a mechanism to mask attribute values in normal output. Default to masking _remote_pass, _tpm_owner_pass, and _bmc_pass.

  • Make more of an effort to mask the SOL password in output.

  • Prevent the creation of unrecognized reserved attributes and update reserved attributes documentation.

  • Include a sched_watcher agent for collecting node status from slurm.

  • Rework compute node client certificate handling.

  • Clean up dhcp6 error messages.

  • Fix kernel version sorting in sclyd-mkramfs.

  • Update numerous python and npm dependencies.

  • Assorted other bug fixes and documentation updates.

12.1.1-g0000 - January 23, 2024#

  • Assorted fixes for initramfs ignition use when booting el9 nodes.

  • Rework how scyld-nodectl ssh gets node keys allowing for ssh into el9 nodes with FIPS enabled.

  • Print names in place of some UIDs returned by scyld-*ctl tools.

  • Note and handle that ram_total / ram_free are stored in KiB.

  • Check all uses of urlparse().netloc and replace several with urlparse().hostname.

  • Assorted test script and other bug fixes.