Common Additional Configuration#

Following a successful initial install or update of ICE ClusterWare™, or as local requirements of your cluster dictate, you may need to make one or more configuration changes.

Configure Hostname#

Verify that the head node hostname has been set as desired for permanent, unique identification across the network. In particular, ensure that the hostname is not localhost or localhost.localdomain.

Managing Databases#

The ClusterWare platform currently only supports the etcd database.

On head nodes with multiple IP addresses the current ClusterWare etcd implementation has no way to identify the correct network for communicating with other head nodes. By default the system will attempt to use the first non-local IP. Although this is adequate for single head clusters and simple multihead configurations, a cluster administrator setting up a multihead cluster should specify the correct IP. This is done by setting the etcd.peer_url option in the /opt/scyld/clusterware/conf/base.ini file. A correct peer URL on a head node with the IP address of 10.24.1.1, where the 10.24.1.0/24 network should be used for inter-head communications might look like:

etcd.peer_url = http://10.24.1.1:52380

If this value needs to be set or changed on an existing cluster, it should be updated on a single head node, then managedb recover run on that head node, and then other heads (re-)joined to the now correctly configured one. The etcd.peer_url setting should only be necessary on the first head as the proper network will be communicated to new heads during the join process.

The ClusterWare etcd implementation does not allow the second-to-last head node in a multihead cluster to leave or be ejected. See Removing a Joined Head Node for details, and Managing Multiple Head Nodes for broader information about multiple headnode management.

Important

Prior to any manipulation of the distributed database, whether through managedb recover, joining head nodes to a cluster, removing head nodes from a cluster, or switching from Couchbase to etcd, the administrator is strongly encouraged to make a backup of the ClusterWare database using the managedb tool. See managedb.

The etcdctl command provides scriptable direct document querying and manipulation. The ClusterWare platform provides a wrapped version of etcdctl located in the /opt/scyld/clusterware-etcd/bin/ directory. The wrapper should be run as root and automatically applies the correct credentials and connects to the local etcd endpoint. Note that direct manipulation of database JSON documents should only be done when directed by Penguin support.

Configure Administrator Authentication#

ClusterWare administrator authentication is designed to easily integrate with already deployed authentication systems via PAM. By default cluster administrators are authenticated through the pam_authenticator tool that in turn uses the PAM configuration found in /etc/pam.d/cw_check_user. In this configuration, administrators can authenticate using their operating system password as long as they have been added to the ClusterWare system using the scyld-adminctl command. For example, to add username "admin1":

scyld-adminctl create name=admin1

If a ClusterWare administrator is running commands from a system account on the head node by the same name (i.e. ClusterWare administrator fred is also head node user fred), the system will confirm their identity via a Unix socket based protocol. Enabled by default, this mechanism allows the scyld tools to connect to a local socket to securely set a dynamically generated one-time password that is then accepted during their next authentication attempt. This takes place transparently, allowing the administrator to run commands without providing their password. The client code also caches an authentication cookie in the user's .scyldcw/auth_tkt.cookie for subsequent authentication requests.

Managing cluster user accounts is generally outside the scope of the ClusterWare platform and should be handled by configuring the compute node images appropriately for your environment. In large organizations this usually means connecting to Active Directory, LDAP, or any other mechanism supported by your chosen compute node operating system. In simpler environments where no external source of user identification is available or it is not accessible, the ClusterWare platform provides a sync-uids tool. This program can be found in the /opt/scyld/clusterware-tools/bin directory and can be used to push local user accounts and groups either to compute nodes or into a specified image. For example:

# push uids and their primary uid-specific groups:
sync-uids --users admin1,tester --image SlurmImage

# push uid with an additional group:
sync-uids --users admin1 --groups admins --image SlurmImage

The above pushes the users and groups into the compute node image for persistence across reboots. Then either reboot the node(s) to see these changes, or push the IDs into running nodes with:

sync-uids --users admin1,tester --nodes n[1-10]

The tool generates a shell script that is then executed on the compute nodes or within the image chroot to replicate the user and group identifiers on the target system. This tool can also be used to push ssh keys into the authorized_keys files for a user onto booted compute nodes or into a specified image. Please see the tool's --help output for more details and additional functionality, such as removing users or groups, and controlling whether home directories are created for injected user accounts.

Disable/Enable Chain Booting#

The default ClusterWare behavior is to perform chain booting for more efficient concurrency for servicing a flood of PXEbooting nodes that are requesting their large rootfs file. Without chain booting, the head node(s) serve the rootfs file for all PXEbooting nodes and thus become a likely bottleneck when hundreds of nodes are concurrently requesting their file. With chain booting, the head node(s) serve the rootfs files to the first compute node requesters, then those provisioned compute nodes offer to serve as a temporary rootfs file server for other requesters.

In the event that the cluster administrator wishes to disable chain booting, then the cluster administrator executing as user root should edit the file /opt/scyld/clusterware/conf/base.ini to add the line:

chaining.enable = False

To reenable chain booting, either change that False to True, or simply comment-out that chaining.enable line to revert back to the default enabled state.

scyld-nss Name Service Switch (NSS) Tool#

The scyld-nss package provides a Name Service Switch (NSS) tool that translates a hostname to its IP address or an IP address to its hostname(s), as specified in the /etc/scyld-nss-cluster.conf configuration file. These hostnames and their IP addresses (e.g., for compute nodes and switches) are those managed by the ClusterWare database, which automatically provides that configuration file at startup and thereafter if and when the cluster configuration changes.

Note

scyld-nss is currently only supported on head nodes.

Installing scyld-nss inserts the scyld function in the /etc/nsswitch.conf hosts line, and installs the ClusterWare /lib64/libnss_scyld* libraries to functionally integrate with the other NSS /lib64/libnss_* libraries.

Benefits include an expanded functionality of ClusterWare hostname resolution and increased performance of NSS queries for those hostnames. Install the nscd package for additional performance improvement of hostname queries, especially on clusters with very high node counts.

The scyld-nss package includes a scyld-nssctl tool allowing a cluster administrator to manually stop or start the service by removing or reinserting the scyld function in /etc/nsswitch.conf. Any user can employ scyld-nssctl to query the current status of the service. See scyld-nssctl for details.

Firewall Configuration#

If you are not using the torque-scyld or slurm-scyld packages, either of which will transparently configure the firewall on the private cluster interface between the head node(s), job scheduler servers, and compute nodes, then you need to configure the firewall manually for both the head node(s) and all compute nodes.

Configure IP Forwarding#

By default, the head node does not allow IP forwarding from compute nodes on the private cluster network to external IP addresses on the public network. If IP forwarding is desired, then it must be enabled and allowed through each head node's firewalld configuration.

On a head node, to forward internal compute node traffic through the <PUBLIC_IF> interface to the outside world, execute:

firewall-cmd --zone=external --change-interface=<PUBLIC_IF>
# confirm it was working at this point then make it permanent
firewall-cmd --permanent --zone=external --change-interface=<PUBLIC_IF>

Appropriate routing for compute nodes can be modified in the compute node image(s) (see scyld-modimg tool). Limited changes may also require modifying the DHCP configuration template /opt/scyld/clusterware-iscdhcp/dhcpd.conf.template.

Status and Health Monitoring#

The ClusterWare platform provides a set of status and monitoring tools out-of-the-box, but admins can also use the plugin system to add or modify the list of status, hardware, health-check, and monitoring (Telegraf) plugins. Some of these plugins will be built into the disk image and cannot be removed without modifying that image manually; others may be added or removed on-the-fly through several node attributes:

[admin1@head]$ scyld-nodectl ls -L
Nodes
  n0
    attributes
      _boot_config: DefaultBoot
      _status_plugins=chrony,ipmi
      _hardware_plugins=infiniband,nvidia
      _health_plugins=rasmem,timesync
      _telegraf_plugins=lm-sensors,nvidia-smi
    domain: cluster.local
    . . .

The scyld-nodectl tool can be used to apply these attributes to individual or groups of nodes, or admins can create attribute-groups with scyld-attribctl and then join nodes to those groups.

See ICE ClusterWare Plugin System for more details on the ClusterWare Plugin System.

Install Name Service Cache Daemon (nscd)#

The Name Service Cache Daemon (nscd) provides a cache for most common name service requests. The performance impact for very large clusters is significant.

Install jq Tool#

The jq tool (/usr/bin/jq) is installable from the standard Linux distribution repositories and provides a command-line parser for JSON output.

For example, for the --long status of node n0:

[sysadmin@head1 /]$ scyld-nodectl -i n0 ls --long
Nodes
  n0
    attributes
      _boot_config: DefaultBoot
      _no_boot: 0
      last_modified: 2019-06-05 23:44:48 UTC (8 days, 17:09:55 ago)
    groups: []
    hardware
      cpu_arch: x86_64
      cpu_count: 2
      cpu_model: Intel Core Processor (Broadwell)
      last_modified: 2019-06-06 17:15:59 UTC (7 days, 23:38:45 ago)
      mac: 52:54:00:a6:f3:3c
      ram_total: 8174152
    index: 0
    ip: 10.54.60.0
    last_modified: 2019-06-14 16:54:39 UTC (0:00:04 ago)
    mac: 52:54:00:a6:f3:3c
    name: n0
    power_uri: none
    type: compute
    uid: f7c2129860ec40c7a397d78bba51179a

You can use jq to parse the JSON output to extract specific fields:

[sysadmin@head1 /]$ scyld-nodectl --json -i n0 ls -l | jq '.n0.mac'
"52:54:00:a6:f3:3c"

[sysadmin@head1 /]$ scyld-nodectl --json -i n0 ls -l | jq '.n0.attributes'
{
  "_boot_config": "DefaultBoot",
  "_no_boot": "0",
  "last_modified": 1559778288.879129
}

[sysadmin@head1 /]$ scyld-nodectl --json -i n0 ls -l | jq '.n0.attributes._boot_config'
"DefaultBoot"

All of the scyld-* tools can produce JSON data, so similar techniques can be applied to images, boot configuration, and so on. For example, use the the following to see the sizes of different images:

[sysadmin@head1 /]$ scyld-imgctl --json ls -L | jq '.[].content.cwsquash.size'
1277071360
1467298823

[sysadmin@head1 /]$ scyld-imgctl --json ls -L | jq '.[] | "\(.name) \(.content.cwsquash.size)"'
"DefaultImage 1277071360"
"NewImage 1277071360"

In this example, jq takes all of the top-level items (.[]) and then creates a text string for each item (the double quotes), using the .name and .content.cwsquash.size fields separated by a space. The \(..) notation selects the fields.

Further information, including tutorials and user manuals, can be found at https://jqlang.github.io/jq/.