Skip to main content
Ctrl+K
Penguin Solutions ICE Product Documentation Penguin Solutions ICE Product Documentation
  • ICE ClusterWare Documentation
  • Overview
  • Quickstart
  • Install
  • Administration
  • Articles
    • API Reference
    • Release Notes, Changelog, and Known Issues
    • Frequently Asked Questions (FAQ)
    • License Agreements
    • Feedback
  • ICE ClusterWare Documentation
  • Overview
  • Quickstart
  • Install
  • Administration
  • Articles
  • API Reference
  • Release Notes, Changelog, and Known Issues
  • Frequently Asked Questions (FAQ)
  • License Agreements
  • Feedback

Section Navigation

Introduction

  • Supported Distributions and Features
  • Required and Recommended Components

Install

  • Install ICE ClusterWare
    • Download the ICE ClusterWare Install Script and Related Files
    • Execute the ICE ClusterWare Install Script
  • Install in an Air-Gapped Cluster Environment
  • cw-install
  • cw-tool-config
  • cw-cluster-conf

Cluster Security

  • Securing the Cluster
    • Authentication
    • Role-Based Access Controls
    • Changing the Database Password
    • Compute Node Remote Access
    • Compute Node Host Keys
    • Encrypting Communications
    • Security-Enhanced Linux (SELinux)
    • Security Technical Implementation Guides (STIG)

Additional Configuration

  • Services, Ports, Protocols
  • Common Additional Configuration
  • Additional Software
    • Adding 3rd-party Software
    • Job Schedulers
    • Kubernetes
      • cw-kube
    • OpenMPI, MPICH, and/or MVAPICH

Multitenant Cluster Configuration

  • Install Multi-Tenant Cluster
  • Multi-Tenant Cluster Install Prerequisites
    • Configure OpenSM with ICE ClusterWare
  • Install the Multi-Tenant Package
  • Configure Multi-Tenancy Ethernet Network
  • Multi-Tenancy Configuration Files
  • Run Multi-Tenancy Setup Script
  • Configure Multi-Tenant Cluster within ICE ClusterWare
  • Install
  • Required and Recommended Components

Required and Recommended Components#

The Overview describes basic, high availability, advanced networking, and multi-tenant ICE ClusterWare ™ cluster architectures. The following sections provide minimum and recommended components for your cluster.

Head Nodes#

Use virtual machines hosted by bare metal hypervisors for ClusterWare head nodes. Virtual machines are easy to resize and easy to migrate between hypervisors. The bare metal hypervisor host must contain the aggregated resources required by each hosted virtual server plus additional CPUs/cores and RAM resources devoted to the hypervisor functionality itself. ClusterWare head nodes should use x86_64 processors running a Red Hat RHEL, Rocky, or similar distribution. See Supported Distributions and Features for specifics.

ClusterWare head nodes should ideally be lightweight for simplicity and contain only software that is needed for the local cluster configuration. Non-root users typically do not have direct access to head nodes and do not execute applications on head nodes.

Head node components for a production cluster include:

  • x86_64 processor(s) with a minimum of 4 cores. Including more than 4 cores in the virtual machine will speed up common activities, such as unpacking and packing images. Large clusters may need additional cores.

    Note

    Contact Penguin Computing if you are interested in using ClusterWare with AArch64 or RISC-V architectures.

  • 4GB RAM (minimum), 8GB RAM (recommended). Large clusters may need additional RAM.

  • 100GB local NVMe storage (minimum).

    The largest storage consumption contains packed images, uploaded ISOs, and so on. Its location is set in the file /opt/scyld/clusterware/conf/base.ini and defaults to /opt/scyld/clusterware/storage/. This location should be unique per head node if your cluster contains multiple head nodes. It should not be set to shared storage between head nodes.

    The directory /opt/scyld/clusterware/git/cache/ consumes storage roughly the size of the Git repos hosted by the system.

    Other than the storage/ and git/cache/ subdirectories discussed above, the /opt/scyld/ directory consumes roughly 1GB.

    Each administrator's ~/.scyldcw/workspace/ directory contains unpacked images that have been downloaded by an administrator for modification or viewing.

    Large clusters with more images, more nodes with log aggregation, or long retention periods for telemetry data may want more local storage on a head node.

  • One Ethernet controller (required) that connects to the private cluster network to interconnect the head node(s) with all compute nodes.

  • A second Ethernet controller (recommended) that connects the head node to the internet.

A High Availability ("HA") cluster requires a minimum of three production head nodes, each a virtual machine hosted on a different bare metal hypervisor. You can have up to seven head nodes.

Compute Nodes#

You can have from tens to thousands of compute nodes in a ClusterWare cluster. Compute nodes are generally bare metal servers for optimal performance. See Supported Distributions and Features for a list of supported distributions.

Compute node components for a production cluster include bare metal machines that meet the cluster application needs of your end users. The actual requirements for the ClusterWare software are low, but typical production compute nodes include:

  • x86_64 processors with multiple cores

  • 192GB RAM or more

You can create a virtual compute node for testing or demonstrations. For example, cluster administrators may use a small number of virtual compute nodes to test system upgrades or image updates prior to production deployment. Virtual compute nodes require:

  • x86_64 processor with 1 core (minimum), 2 cores (recommended)

  • 6GB RAM (minimum), 8GB RAM (recommended)

Login Nodes#

Login node resources should be similar to head nodes, but you may want additional compute power and storage if users are compiling and testing applications on the login node. Include enough storage for the local operating system (OS). If your cluster has shared storage accessible to the compute nodes, the shared storage should also be accessible from the login node.

Networking#

Multiple Ethernet or other high-performance network controllers (InfiniBand, Omni-Path) are common on the compute nodes, but do not need to be accessible by the head node(s).

Use the nmcli connection add tool to create network bridges and to add physical interfaces to those newly created bridges. Once appropriate bridges exist, use the virt-install command to attach the virtual interfaces to the bridges, so that the created virtual machines exist on the same networks as the physical interfaces on the hypervisor.

Important

By design, ClusterWare compute nodes handle DHCP responses on the private cluster network (bootnet) by employing the base distribution's facilities, including NetworkManager. If your cluster installs a network file system or other software that disables this base distribution functionality, then you must configure dhclient or custom static IP addresses and potentially additional workarounds.

Slurm and Kubernetes#

If you are using a Slurm node or Kubernetes node, consult their respective documentation for system requirements.

Multi-Tenant Clusters#

In addition to the head node, compute node, and networking requirements detailed above, multi-tenant clusters must include:

  • Enterprise Sonic switches version 4.4 or later that support EVPN technology, such as Enterprise Standard Sonic or Enterprise Premium Sonic.

  • InfiniBand network that supports PKey partitioning.

Multi-tenant clusters can include up to 64 Enterprise Sonic switches, up to 1200 compute nodes, approximately 10,000 GPUs, plus supporting infrastructure. Contact Penguin Computing to design a supported multi-tenant environment that meets your needs.

previous

Supported Distributions and Features

next

Install ICE ClusterWare

On this page
  • Head Nodes
  • Compute Nodes
  • Login Nodes
  • Networking
  • Slurm and Kubernetes
  • Multi-Tenant Clusters

© Copyright 2018-2025, Penguin Computing. All rights reserved.