Execute the ICE ClusterWare Install Script#
If /etc/yum.repos.d/clusterware.repo
exists, then scyld-install
's
subsequent invocations of yum
will employ that configuration file.
If /etc/yum.repos.d/clusterware.repo
does not exist,
then scyld-install
prompts the user for an appropriate authentication token
and uses that to build a /etc/yum.repos.d/clusterware.repo
that is
customized to your cluster.
scyld-install
accepts an optional argument specifying a cluster configuration
file that contains information necessary to set up the DHCP server.
For example:
cat <<-EOF >/tmp/cluster-conf
interface enp0s9 # names the private cluster interface
nodes 4 # max number of compute nodes
iprange 10.10.32.45 # starting IP address of node 0
node 08:00:27:f0:44:35 # node 0 MAC address
node 08:00:27:f0:44:45 # node 1 MAC address
node 08:00:27:f0:44:55 # node 2 MAC address
node 08:00:27:f0:44:65 # node 3 MAC address
EOF
where the syntax of this cluster configuration file is:
domain <DOMAIN_NAME>
Optional. Defaults to "cluster.local".
interface <INTERFACE_NAME>
Optional. Specifies the name of head node's interface to the private cluster network, although that can be determined from the specification of the <FIRST_IP> in the iprange line.
nodes <MAX_COUNT>
Optional. Specifies the max number of compute nodes, although that can be determined from the iprange if both the <FIRST_IP> and <LAST_IP> are present. The max will also adjust as-needed if and when additional nodes are defined. For example, see Node Creation with Known MAC address(es).
iprange <FIRST_IP> [<LAST_IP>]
Specifies the IP address of the first node (which defaults to n0) and optionally the IP address of the last node. The <LAST_IP> can be deduced from the <FIRST_IP> and the nodes <MAX_COUNT>. The <FIRST_IP> can include an optional netmask via a suffix of /<BIT_COUNT> (e.g., /24) or a mask (e.g., /255.255.255.0).
<FIRST_INDEX> <FIRST_IP> [<LAST_IP>] [via <FROM_IP>] [gw <GATEWAY_IP>]
This is a more elaborate specification of a range of IP addresses, and it is common when using DHCP relays or multiple subnets. <FIRST_INDEX> specifies that the first node in this range is node n<FIRST_INDEX> and is assigned IP address <FIRST_IP>; optionally specifies that the range of nodes make DHCP client requests that arrive on the interface that contains <FROM_IP>; optionally specifies that each DHCP'ing node be told to use <GATEWAY_IP> as their gateway, which otherwise defaults to the IP address (on the private cluster network) of the head node.
For example:
128 10.10.24.30/24 10.10.24.100 via 192.168.65.2 gw 10.10.24.254
defines a DHCP range of 71 addresses, the first starting with 10.10.24.30, and assigns the first node in the range as n128; watches for DHCP requests arriving on the interface containing 192.168.65.2; and tells these nodes to use 10.10.24.254 as the their gateway.
node [<INDEX>] <MAC> [<MAC>]
One compute node per line, and commonly consisting of multiple node lines, where each DHCP'ing node is recognized by its unique MAC address and is assigned an IP address using the configuration file specifications described above. Currently only the first <MAC> is used. An optional <INDEX> is the index number of the node that overrides the default of sequentially increasing node number indices and thereby creates a gap of unassigned indices. For example, a series of eight node lines without an <INDEX> that is followed by
node 32 52:54:00:c4:f7:1e
creates a gap of unassigned indices n8 to n31 and assigns this node as n32.
Note
ICE ClusterWare™ yum repositories contain RPMs that duplicate various
Red Hat EPEL RPMs, and these ClusterWare RPMs get installed or updated in
preference to their EPEL equivalents, even if /etc/yum.repos.d/
contains an EPEL .conf file.
Note
The ClusterWare platform employs userid/groupid 539 to simplify communication
between the head node(s) and the backend shared storage where it stores
node image files, kernels, and initramfs files.
If the scyld-install
script detects that this uid/gid is already in use
by other software, then the script issues a warning and
chooses an alternative new random uid/gid.
The cluster administrator needs to set the appropriate permissions on
that shared storage to allow all head nodes to read and write all files.
The ClusterWare database is stored as JSON content within a replicated document store distributed among the ClusterWare head nodes. This structure protects against the failure of any single head node.
For example, using the cluster-config
created above,
install the ClusterWare platform from a yum repo:
scyld-install --config /tmp/cluster-conf
By default scyld-install
creates the DefaultImage that contains a kernel and
rootfs software from the same base distribution installed on the head node,
although if the head node executes RHEL8,
then no DefaultImage and DefaultBoot are created.
Alternatively, for more flexibility (especially with a RHEL8 head node), execute the installer with an additional option that identifies the base distribution to be used for the DefaultImage:
scyld-install --config /tmp/cluster-conf --os-iso <ISO-file>
where <ISO-file> is either a pathname to an ISO file or a URL of an ISO file. That ISO can match the head node's distribution or can be any supported distribution.
scyld-install
unpacks an embedded compressed payload
and performs the following steps:
Checks for a possible newer version of the clusterware-installer RPM. If one is found, then the script will update the local RPM installation and execute the newer
scyld-install
script with the same arguments. An optional argument--skip-version-check
bypasses this check.An optional argument
--yum-repo /tmp/clusterware.repo
re-installs a yum repo file to/etc/yum.repos.d/clusterware.repo
. This is unnecessary if/etc/yum.repos.d/clusterware.repo
already exists and is adequate.Checks whether the clusterware RPM is installed.
Confirms the system meets various minimum requirements.
Installs the clusterware RPM and its supporting RPMs.
Copies a customized Telegraf configuration file to
/etc/telegraf/telegraf.conf
Enables the tftpd service in
xinetd
for PXE booting.Randomizes assorted security-related values in
/opt/scyld/clusterware/conf/base.ini
Sets the current user account as a ClusterWare administrator in
/opt/scyld/clusterware/conf/base.ini
. If this is intended to be a production cluster, then the system administrator should create additional ClusterWare administrator accounts and clear this variable. For details on this and other security related settings, including adding ssh keys to compute nodes, please see Securing the Cluster.Modifies
/etc/yum.repos.d/clusterware.repo
to changeenabled=1
toenabled=0
. Subsequent executions ofscyld-install
to update the ClusterWare platform will temporarily (and silently) re-enable the ClusterWare repo for the duration of that command. This is done to avoid inadvertent updates of ClusterWare packages if and when the clusterware administrator executes a more generalyum install
oryum update
intending to add or update the base distribution packages.
Then scyld-install
uses systemd
to enable and start firewalld
,
and opens ports for communication between head nodes as required by etcd.
See Services, Ports, Protocols for details.
Once the ports are open, scyld-install
initializes the ClusterWare
database and enables and starts the following services:
httpd: The Apache HTTP daemon that runs the ClusterWare service and proxies Grafana.
xinetd: Provides network access to tftp for PXE booting.
telegraf: Collects head node performance data and transmits to telegraf-relay service.
telegraf-relay: Forwards telegraf data to InfluxDB and to telegraf-relay services running on head nodes
influxdb: Stores node performance and status data for visualization in Grafana.
grafana-server: Displays the head node and compute node status data through a web interface.
The script then:
Opens ports in
firewalld
for public access to HTTP, HTTPS, TFTP, iSCSI, and incoming Telegraf UDP messages.Note
UDP message sending is deprecated as of the ClusterWare 12.4.0 release.
Installs and configures the cluster administrator's clusterware-tools package (unless it was executed with the
--no_tools
option).Configures the cluster administrator's
~/.scyldcw/settings.ini
to access the newly installed ClusterWare service using thescyld-tool-config
tool.Creates an initial simple boot image DefaultImage, boot config DefaultBoot, and attributes DefaultAttribs using the
scyld-add-boot-config
tool.Loads the cluster configuration specified on the command line using the
scyld-cluster-conf load
command.Restarts the httpd service to apply the loaded cluster configuration.
Important
See the Boot Configurations for details about how to modify existing boot images, create new boot images, and associate specific boot images and attributes with specific compute nodes. We strongly recommend not modifying or removing the initial DefaultImage, but rather cloning that basic image into a new image that gets modified further, or just creating new images from scratch.
Important
If you wish to ensure that the latest packages are installed in
the image after the scyld-install
,
then execute scyld-modimg -i DefaultImage --update --overwrite --upload
.
Important
See Common Additional Configuration for additional optional cluster configuration procedures, e.g., installing and configuring a job scheduler, installing and configuring one of the MPI family software stacks.
Important
If this initial scyld-install
does not complete successfully,
or if you want to begin the installation anew,
then when/if you re-run the script, you should cleanse the partial,
potentially flawed installation by adding the --clear
argument, e.g.,
scyld-install --clear --config /tmp/cluster-conf
.
If that still isn't sufficient, then
scyld-install --clear-all --config /tmp/cluster-conf
does a more
complete clearing, then reinstalls all the ClusterWare packages.
Due to licensing restrictions, when running on a Red Hat RHEL system,
the installer will still initially create a Rocky compute
node image as the DefaultImage.
If after this initial installation a cluster administrator wishes to instead
create compute node images based on RHEL,
then use the scyld-clusterctl repos
tool as described in
Creating Arbitrary RHEL Images,
and create a new image (e.g., DefaultRHELimage) to use as a new default.