Common Additional Configuration#
Following a successful initial install or update of ICE ClusterWare ™, or as local requirements of your cluster dictate, you may need to make one or more configuration changes.
Configure Hostname#
Verify that the head node hostname has been set as desired for permanent,
unique identification across the network.
In particular, ensure that the hostname is not
localhost or localhost.localdomain.
Managing Databases#
The ClusterWare platform currently only supports the etcd database.
On head nodes with multiple IP addresses the current ClusterWare etcd
implementation has no way to identify the correct network for
communicating with other head nodes. By default the system will
attempt to use the first non-local IP. Although this is adequate for
single head clusters and simple multihead configurations, a cluster
administrator setting up a multihead cluster should specify the
correct IP. This is done by setting the etcd.peer_url option in
the /opt/scyld/clusterware/conf/base.ini file. A correct peer URL
on a head node with the IP address of 10.24.1.1, where the 10.24.1.0/24
network should be used for inter-head communications might look like:
etcd.peer_url = http://10.24.1.1:52380
If this value needs to be set or changed on an existing cluster, it
should be updated on a single head node, then managedb recover run
on that head node, and then other heads (re-)joined to the now
correctly configured one. The etcd.peer_url setting should only be
necessary on the first head as the proper network will be communicated
to new heads during the join process.
The ClusterWare etcd implementation does not allow the second-to-last head node in a multihead cluster to leave or be ejected. See Manually Removing a Joined Head Node for details, and Managing Multiple Head Nodes for broader information about multiple headnode management.
Important
Prior to any manipulation of the distributed database,
whether through managedb recover, joining head nodes to a cluster,
removing head nodes from a cluster, or switching from Couchbase to etcd,
the administrator is strongly encouraged to make
a backup of the ClusterWare database using the managedb
tool. See managedb.
The etcdctl command provides scriptable direct document querying and
manipulation. The ClusterWare platform provides a wrapped version of etcdctl
located in the /opt/scyld/clusterware-etcd/bin/ directory. The
wrapper should be run as root and automatically applies the correct
credentials and connects to the local etcd endpoint. Note that direct
manipulation of database JSON documents should only be done when
directed by Penguin Computing support.
Configure Administrator Authentication#
ClusterWare administrator authentication is designed to easily
integrate with already deployed authentication systems via PAM. By
default cluster administrators are authenticated through the
pam_authenticator tool that in turn uses the PAM configuration
found in /etc/pam.d/cw_check_user. In this configuration,
administrators can authenticate using their operating system password
as long as they have been added to the ClusterWare system using the
cw-adminctl command.
For example, to add username "admin1":
cw-adminctl create name=admin1
If a ClusterWare administrator is running commands from a system
account on the head node by the same name (i.e. ClusterWare
administrator fred is also head node user fred), the system
will confirm their identity via a Unix socket based protocol. Enabled
by default, this mechanism allows the cw tools to connect to a
local socket to securely set a dynamically generated one-time password
that is then accepted during their next authentication attempt. This
takes place transparently, allowing the administrator to run commands
without providing their password. The client code also caches an
authentication cookie in the user's .scyldcw/auth_tkt.cookie for
subsequent authentication requests.
Managing cluster user accounts is generally outside the scope of
the ClusterWare platform and should be handled by configuring the compute node
images appropriately for your environment. In large organizations this
usually means connecting to Active Directory, LDAP, or any other
mechanism supported by your chosen compute node operating system. In
simpler environments where no external source of user identification
is available or it is not accessible, the ClusterWare platform provides a
sync-uids tool. This program can be found in the
/opt/scyld/clusterware-tools/bin directory and can be used to push
local user accounts and groups either to compute nodes or into a
specified image. For example:
# push uids and their primary uid-specific groups:
sync-uids --users admin1,tester --image SlurmImage
# push uid with an additional group:
sync-uids --users admin1 --groups admins --image SlurmImage
The above pushes the users and groups into the compute node image for persistence across reboots. Then either reboot the node(s) to see these changes, or push the IDs into running nodes with:
sync-uids --users admin1,tester --nodes n[1-10]
The tool generates a shell script that is then executed on the compute
nodes or within the image chroot to replicate the user and group
identifiers on the target system. This tool can also be used to push
ssh keys into the authorized_keys files for a user onto booted compute
nodes or into a specified image. Please see the tool's --help
output for more details and additional functionality, such as removing
users or groups, and controlling whether home directories are created
for injected user accounts.
Disable/Enable Chain Booting#
The default ClusterWare behavior is to perform chain booting for more efficient concurrency for servicing a flood of PXE booting nodes that are requesting their large rootfs file. Without chain booting, the head node(s) serve the rootfs file for all PXE booting nodes and thus become a likely bottleneck when hundreds of nodes are concurrently requesting their file. With chain booting, the head node(s) serve the rootfs files to the first compute node requesters, then those provisioned compute nodes offer to serve as a temporary rootfs file server for other requesters.
In the event that the cluster administrator wishes to disable chain booting,
then the cluster administrator executing as user root should edit the file
/opt/scyld/clusterware/conf/base.ini to add the line:
chaining.enable = False
To reenable chain booting, either change that False to True,
or simply comment-out that chaining.enable line to revert back to the
default enabled state.
cw-nss Name Service Switch (NSS) Tool#
The cw-nss package provides a Name Service Switch (NSS) tool that
translates a hostname to its IP address or an IP address to its hostname(s),
as specified in the /etc/cw-nss-cluster.conf configuration file.
These hostnames and their IP addresses (e.g., for compute nodes and switches)
are those managed by the ClusterWare database,
which automatically provides that configuration file at startup and thereafter
if and when the cluster configuration changes.
Note
cw-nss is currently only supported on head nodes.
Installing cw-nss inserts the cw function in the
/etc/nsswitch.conf hosts line,
and installs the ClusterWare /lib64/libnss_cw* libraries
to functionally integrate with the other NSS /lib64/libnss_* libraries.
Benefits include an expanded functionality of ClusterWare hostname resolution and increased performance of NSS queries for those hostnames. Install the nscd package for additional performance improvement of hostname queries, especially on clusters with very high node counts.
The cw-nss package includes a cw-nssctl tool allowing a cluster
administrator to manually stop or start the service by removing or
reinserting the cw function in /etc/nsswitch.conf.
Any user can employ cw-nssctl to query the current status of the
service.
See cw-nssctl for details.
Compute Node Network Connectivity#
The slurm-cw package transparently configures the firewall between the head node(s), job scheduler servers, and compute nodes. If you are not using the slurm-cw package, then you may need to configure the firewall manually for the head node(s) and all compute nodes if you need to allow traffic that ClusterWare does not open in firewalld.
By default, the head node does not allow IP forwarding from compute nodes on the private cluster network to external IP addresses on the wider network. If you need IP forwarding, you must enable and allow it through each head node's firewalld configuration.
Note
The firewall commands refer to the default firewalld "public" and "external" zones. If you have modified your network configuration to use different zones, substitute those names instead.
Run the following on a head node to forward internal compute node traffic through the
<EXTERNAL_IF>interface to the external network:firewall-cmd --permanent --zone=external --change-interface=<EXTERNAL_IF> firewall-cmd --reload
On el9 systems, add a policy to allow traffic to flow from the public to the external interface:
firewall-cmd --permanent --new-policy public-external firewall-cmd --reload firewall-cmd --permanent --policy public-external --add-ingress-zone=public firewall-cmd --permanent --policy public-external --add-egress-zone=external firewall-cmd --permanent --policy public-external --set-target=ACCEPT firewall-cmd --reload
Repeat on all head nodes.
Appropriate routing for compute nodes can be modified in the compute
node image(s) (see cw-modimg tool).
Limited changes may also require modifying the DHCP configuration template
/opt/scyld/clusterware-iscdhcp/dhcpd.conf.template.
See Open Network Ports for additional networking information.
Status and Health Monitoring#
The ClusterWare platform provides a set of status and monitoring tools out-of-the-box, but admins can also use the plugin system to add or modify the list of status, hardware, health-check, and monitoring (Telegraf) plugins. Some of these plugins will be built into the disk image and cannot be removed without modifying that image manually; others may be added or removed on-the-fly through several node attributes:
[admin1@head]$ cw-nodectl ls -L
Nodes
n0
attributes
_boot_config: DefaultBoot
_status_plugins=chrony,ipmi
_hardware_plugins=infiniband,nvidia
_health_plugins=rasmem,timesync
_telegraf_plugins=lm-sensors,nvidia-smi
domain: cluster.local
. . .
The cw-nodectl tool can be used to apply these attributes to
individual or groups of nodes, or admins can create attribute-groups
with cw-attribctl and then join nodes to those groups.
See ICE ClusterWare Plugin System for more details on the ClusterWare Plugin System.
Install Name Service Cache Daemon (nscd)#
The Name Service Cache Daemon (nscd) provides a cache for most common name service requests. The performance impact for very large clusters is significant.
Install jq Tool#
The jq tool (/usr/bin/jq) is installable from the standard Linux
distribution repositories and provides a command-line parser for JSON output.
For example, for the --long status of node n0:
[sysadmin@head1 /]$ cw-nodectl -i n0 ls --long
Nodes
n0
attributes
_boot_config: DefaultBoot
_no_boot: 0
last_modified: 2019-06-05 23:44:48 UTC (8 days, 17:09:55 ago)
groups: []
hardware
cpu_arch: x86_64
cpu_count: 2
cpu_model: Intel Core Processor (Broadwell)
last_modified: 2019-06-06 17:15:59 UTC (7 days, 23:38:45 ago)
mac: 52:54:00:a6:f3:3c
ram_total: 8174152
index: 0
ip: 10.54.60.0
last_modified: 2019-06-14 16:54:39 UTC (0:00:04 ago)
mac: 52:54:00:a6:f3:3c
name: n0
power_uri: none
type: compute
uid: f7c2129860ec40c7a397d78bba51179a
You can use jq to parse the JSON output to extract specific fields:
[sysadmin@head1 /]$ cw-nodectl --json -i n0 ls -l | jq '.n0.mac'
"52:54:00:a6:f3:3c"
[sysadmin@head1 /]$ cw-nodectl --json -i n0 ls -l | jq '.n0.attributes'
{
"_boot_config": "DefaultBoot",
"_no_boot": "0",
"last_modified": 1559778288.879129
}
[sysadmin@head1 /]$ cw-nodectl --json -i n0 ls -l | jq '.n0.attributes._boot_config'
"DefaultBoot"
All of the cw-* tools can produce JSON data, so similar techniques can
be applied to images, boot configuration, and so on. For example, use the
the following to see the sizes of different images:
[sysadmin@head1 /]$ cw-imgctl --json ls -L | jq '.[].content.cwsquash.size'
1277071360
1467298823
[sysadmin@head1 /]$ cw-imgctl --json ls -L | jq '.[] | "\(.name) \(.content.cwsquash.size)"'
"DefaultImage 1277071360"
"NewImage 1277071360"
In this example, jq takes all of the top-level items (.[]) and then
creates a text string for each item (the double quotes), using the .name
and .content.cwsquash.size fields separated by a space. The \(..)
notation selects the fields.
Further information, including tutorials and user manuals, can be found at https://jqlang.github.io/jq/.