Changelog#
See Release Notes for summary information about the latest ICE ClusterWare ™ release. This section contains a more detailed history of all recent releases.
13.0.0-g0000 - December 2, 2025#
Transition public installation repository to repo.ice.penguinsolutions.com. Both the ClusterWare 12 and ClusterWare 13 installation files are now available in the new repository. Contact Penguin Computing for details.
Update commands to use
cw-*prefix. Scripts that use the olderscyld-*commands will continue to work, but should be updated with the new syntax as support for legacyscyld-*commands will be removed in a future release.Head nodes now require Python 3.12 or later.
New Syslog and Audit Log viewers in Grafana. Syslog forwarding is enabled by default for all head nodes. Audit Log collection is managed through a Telegraf plugin.
The ClusterWare graphical user interface (GUI) is now at near-parity with the command line (CLI) tools. All major actions on any database primitive can be done through either the GUI or CLI.
Work around new umount behavior found in recent Ubuntu versions.
Update Ubuntu initramfs to include the bash dracut module.
Include mdamd in Ubuntu images.
Fix grub installation when deploying Ubuntu.
Remove some untested supported distros from the documentation.
Support setting node attributes during node creation with
a.name=value.Implement tenancy-aware InfiniBand partitioning.
Add InfiniBand networks (ibnets) to configure OpenSM partitioning based on tenancy configuration.
Upgrade Telegraf to version 1.35.3.
Create a separate
clusterware-telegraf-relaypackage.Initial pass at relaying both Telegraf and ClusterWare status out of tenancies.
Batch relayed Telegraf messages to improve throughput.
Add an example Butane script with mdraid mirroring.
Print clearer error messages when the backend is not responding.
Use
UUID=lines in thefstablines generated during deployment.Fix reading and writing the Cluster ID in the
clusterware.repofile.Better document the head auto eject and rejoin process.
Properly refresh access tokens when they are expiring.
Update user token duration argument name from
timeoutorlifetimetolifespan. The legacy arguments will continue to work, but will be removed in a future release.Improvements to token creation and exchange.
Remove the
--rawoption from command line tools.Replace the Pyramid backend with FastAPI.
FastAPI clusterware service does not support the reload action for now.
Bump our internal Git version to 2.48.1 and resolve some permission issues.
Automatically create less privileged Grafana users during installation.
Rearrange and rename log files in
/var/log/clusterwaremoving httpd-specific logging intohttpd_error_logandhttpd_access_log.The clusterware service is no longer an alias for httpd but instead a separate service proxied behind httpd.
Add a reasonable default
RequestReadTimeouthttpd configuration.Add a _hypervisor reserved attribute for an admin to explicitly record where a virtual machine is running.
New _managed_switch reserved attribute is used to mark what switches are managed in a multi-tenant cluster.
New relay:// power plugin to send power requests to a separate cluster in multi-tenant cluster configuration.
Add a new node script (
lldp_neighbors) to set a new reserved attribute _neighbors.The
wipe-disks.shcommand now zeroes the first gigabyte of each disk.By default system logs are collected on head nodes and stored in Influx.
Build the
mt-gatewayimage on each cluster based on Alpine Linux.Speed up the intial dhcp log parsing with a cache.
Improve
PREFER_KMODhandling inmkramfs.conf.Fix
/etc/hostshandling incw-modimg --chrootthat resulted in an empty hosts file.Increase session reuse in client tools for better performance.
Support tenancy creation using carefully formed Terraform files.
Include our own Terraform binary to handle incoming files.
Require jq on head nodes and in images.
Small improvements to the Nvidia hardware data collection in
clusterware-node.Better handling for when the etcd database becomes too large.
Implement a Kubevirt provider to allocate virtual machines on Kubernetes.
Package our own builds of
kubectlandvirtctlfor the Kubevirt provider.Eliminate an infinite loop due to duplicate
clientids.Initial support for importing
qcow2files as images.Collect InfiniBand port GUIDs as hex in node hardware.
Deprecate
authctlforconfigctl.Document how to authenticate using ssh keys on the
/loginendpoint.Report clearer error messages for Keycloak failures.
Improve documentation for air-gapped installations.
Add an article about node selectors.
Fix repo, distro, and gitrepo cloning.
Collect more status information from switches with
clusterware-nodeinstalled.Several fixes to the virsh power plugin.
Use Slurm dynamic node configuration by default but provide a _slurmd reserved attribute to disable this behavior.
Update Slurm, OpenMPI, and other middleware packages.
Remove el7 documentation references.
Retire support for PBS Torque.
Quiet some repetitive logging.
Technology preview of a new audit system.
Set the audit level to debug for the extremely frequent putstatus and daemon process calls.
Leverage the etcd serializable consistency mode for faster putstatus calls.
Batch putstatus operations for improved response times.
Many improvements to the Keycloak interface and plugin.
Add jitter to status updates to reduce contention.
Improve SELinux dependency handling at load time.
Improve handling for freezing and unfreezing objects.
Make loading node and head information optional in managedb.
Better document some _ansible_pull details.
Add a new
cw-modimg --deployoption to run Ansible scripts inside an image.Implement a new reserved attribute _ansible_retries to change _ansible_pull behavior.
Disable
cw-modimgautomatic locking for later improvements.Fix
cw-nsshandling of longer names.Speed up
cw-sysinfodata collection.Introduced new type of switch: SONIC.
Introduced new JSON configuration for SONIC switches.
Enabled JSON configuration of the following on SONIC:
Administrative state (shutdown/no shutdown) of interfaces
Routed Interfaces with static IPv4/IPv6 addresses
Switched VLANs with static IPv4/IPv6 addresses
BGP instance and instance configuration
BGP peers, peer-groups, dynamic neighbors, and BGP Unnumbered neighbors
BGP v4/v6 routing
BGP ECMP
BGP EVPN address family
VXLAN VTEP configuration and VNI/VLAN termination information
Additional GUI improvements include:
Update look and feel in the GUI, including a redesigned login screen, marking required fields with an asterisk, and adjusting icons for easier navigation.
Add drag-and-drop functionality for nodes from the GUI grid or list view to create per-node details tabs.
Separate hostname records into 2 sections, A-records and SRV-records, to reduce confusion.
Keycloak settings can be managed and changed through the GUI.
Dynamic Groups have a Test button in the GUI that can be used to view a list of nodes that match the selector criteria.
All panels now open in "view" (read-only) mode, to avoid accidental modifications to database objects. Users can switch to "edit" mode to make changes.
Boot Configs and Distros now have "full delete" functionality which deletes both the object and any "sub-objects" related to it (full-delete on a Boot Config deletes the related Image; full-delete on a Distros deletes all related Repos).
Improve error messages in all GUI pages.
12.4.2-g0000 - March 17, 2025#
Revert a bare metal boot issue due to a conflict with upcoming Sonic ZTP functionality.
Fix an SELinux issue that interferes with Telegraf on upgrade.
12.4.1-g0000 - March 7, 2025#
Continuing improvements to the ClusterWare graphical user interface (GUI), including more consistent display of labeled fields and components.
Updates to dashboards, including a new GPU dashboard.
SELinux/MLS improvements.
Rebuild the initramfs when deploying grub to a local drive for a persistent installation.
Implement a new _wipe_all attribute that will destroy all local data on a booting system.
Move to Git version 2.48.1 to patch known vulnerabilities.
Bug fixes and documentation updates.
12.4.0-g0000 - February 3, 2025#
The product is rebranded from Scyld ClusterWare to ICE ClusterWare. Initial changes are reflected in the product GUI and in the documentation. Future releases will introduce additional branding updates, including updates to the command line tools.
Implement the first providers plugin, specifically supporting hypervisors running libvirt using virsh and virt-install commands.
Include a couple of example deploy scripts in the /opt/scyld/clusterware-tools/examples/deploy directory.
Reduce repetitive logging.
Implement a new _altmacs reserved attribute that passes alternative MAC addresses for a node to the DHCP server. This attribute may be replaced by a more robust solution in future releases.
Significant simplifications and improvements to the
scyld-kubetool used for deploying Kubernetes.Mark nodes as "busy" if virsh list shows running virtual machines on the node.
A new
scyld-modimg --deployargument allows administrators to execute an Ansible playbook against an image or combine the copy and execute steps for running a shell script inside the image.The
scyld-modimgcommand now accepts a--progressargument to either not print remaining time or to print dots instead of detailed progress.Propagate errors from the debootstrap tool out to the user to simplify Ubuntu image creation debugging.
Prevent users with the NoAccess role from even logging in and prevent tmpadmins from minting tokens.
Improve parsing of
scyld-modimg --runscripts and document the functionality.Add
--discard-on-erroroption toscyld-modimgto facilitate scripting and automation.The
scyld-clusterctl netstool allows admins to define additional networks where ClusterWare nodes may be connected.Improved ClusterWare graphical user interface (GUI) information architecture to help new users navigate the product.
Each primitive now presents a set of labeled fields and components within the ClusterWare GUI that are customized to that primitive.
Updated ClusterWare GUI colors and logos to match the new product branding.
Make ipmitool and rasdaemon weak dependencies of clusterware-node.
Implement a new _aim_status reserved attribute and add support in
scyld-nodectl statusto show status based on that attribute. Contact Penguin Computing to learn more.Rearrange the build system to better isolate Pyramid code.
Move image exports from the image to the head that does the export.
Replace libvirt power plugin with a version that calls
virsh.Remove the deprecated socket-based waitfor code.
Add stricter versioned dependencies between some packages.
Ensure
scyld-clusterctl hostsentries are pushed to scyld-nss.Remove more references to el7 and remove development packages required by el7 builds.
Keep the dnsmasq service up during clusterware service restarts.
Allow the mosquitto service to start even with missing certificates.
Add image locking during modification to prevent administrators from accidentally overwriting each other's changes.
Improve
scyld-modimgto make conflicts between different instances less likely.Implement shared-key encryption for communications between head nodes using stunnel.
Improve our parsing of
ipoutput.Add documentation about communication encryption.
Switch telegraf from UDP to HTTP(S) with a new relay service, significantly reducing telemetry gaps.
Improved method for deploying client packages to switches.
Document how to change the etcd password and create a script to recover if the etcd passwords is lost. Contact Penguin Computing for assistance with the script.
Improve the slurm and kubernetes installation scripts.
Include the API Reference as a part of our standard documentation.
Add a missing dependency required to build newer Ubuntu images.
Update the supported distros table to include el8.10 and el9.5.
Update documentation information architecture and HTML site design to improve user experience.
Assorted other bug fixes and documentation updates.
12.3.0-g0000 - October 4, 2024#
Reduce polling in
scyld-nodectl status --refresh, but leverage the waitfor framework and MQTT.Switch to a Unix socket to communicate between the ClusterWare backend and etcd to enable updating gRPC.
Add a new _bootnet attribute for customizing the name of the bootnet interface.
Support
--selectorto select nodes inslurm-scyld.setup.Introduce an improved clusterware-node deployment mechanism for SONiC switches.
Make compute node code scripting less likely to produce a bad parent-head-node line in /etc/hosts.
Support creating tmpfs subdirectories in ignition for diskless STIG'd systems.
Cleaner handling of the client.sslverify setting.
Reduce the head node minimum memory check after removal of Couchbase.
Restrict access to the GUI to only accept secure remote connections.
Bump the version numbers for most Python dependencies.
Correct "frozen" image handling during import and refuse to delete frozen images.
Remove deprecated code, including code specific to el7 head nodes.
Add functionality for Telegraf to collect ClusterWare node attributes.
Change technique for converting node lists into ranges when reporting status.
Tighten some directory permissions.
Correct the _ipxe_sanboot creation during bootload installation.
Fix a
scyld-bootctl exportfailure that previously required a patch.Provide a mechanism for setting a realtime IO priority on etcd.
Make it more difficult to modify a cached version of an image unintentionally.
Improve gitrepo backend handling to avoid common failures.
Stop creating .old.XX files when modifying objects in multi-head clusters.
Avoid the MOTD interfering with
scyld-nodectl scp.Small fixes to boot chaining failure handling.
Wider use of the cluster certificate authority to securing communications.
Fixes for netplan configurations in Ubuntu images.
Restart Telegraf when moving between head nodes.
KeyCloak integration improvements.
Assorted other bug fixes and documentation updates.
12.2.0-g0000 - July 26, 2024#
Improve Grafana column scaling.
Quiet a warning about TripleDES by removing it as an option from paramiko.
Support
_boot_style=iscsion el8 and el9 systems.Update CentOS 7 and CentOS Stream 8 URLs to use vault.centos.org since el7 is now also EOL.
Improve DNS resolution of head nodes with multiple IPs using localise-queries in the dnsmasq.conf.template but also include a leases.register_heads boolean to disable entire feature.
Write NetworkManager connection files on el9 systems and improve netplan configuration file writing on Ubuntu.
Initial Redfish support including an aggregation daemon with more changes and documentation coming later.
Provide a mechanism to create a bootable ISO from one or more boot configs.
Improve handling of slurm uid and gid syncing when installing packages.
Add arguments to
scyld-nodectl kexecto allow for one-time-booting using a specific image or boot configuration.Improve the
scyld-modimg --captureerror handling.Downgrade ansible-core to 2.15.10 to match Python 3.9.
Small improvements and cleanups across the GUI.
Introduce a new RBAC system for administrators, current scoped cluster-wide. All existing admins will now have the FullAdmin role.
Support substitution within the power_uri field.
Initial support for deploying Harvester nodes from an ISO.
Unhide the existing
scyld-clusterctl netsfunctionality.Include the mosquitto MQTT server to publish system events.
Confirm keys added through
scyld-adminctlcan be loaded with paramiko.Improved Ubuntu image handling in
scyld-modimg.Expose the limited but existing
scyld-nodectl scpfunctionality.Improve ZTP handling but still only supporting Cumulus.
Improve the unknown nodes tab for unrecongized dhcp clients.
Include a mechanism to mask attribute values in normal output. Default to masking _remote_pass, _tpm_owner_pass, and _bmc_pass.
Make more of an effort to mask the SOL password in output.
Prevent the creation of unrecognized reserved attributes and update reserved attributes documentation.
Include a sched_watcher agent for collecting node status from slurm.
Rework compute node client certificate handling.
Clean up dhcp6 error messages.
Fix kernel version sorting in
sclyd-mkramfs.Update numerous python and npm dependencies.
Assorted other bug fixes and documentation updates.
12.1.1-g0000 - January 23, 2024#
Assorted fixes for initramfs ignition use when booting el9 nodes.
Rework how
scyld-nodectl sshgets node keys allowing for ssh into el9 nodes with FIPS enabled.Print names in place of some UIDs returned by
scyld-*ctltools.Note and handle that ram_total / ram_free are stored in KiB.
Check all uses of urlparse().netloc and replace several with urlparse().hostname.
Assorted test script and other bug fixes.