Job Schedulers#

The default ICE ClusterWare™ installation for RHEL/Rocky 8 includes support for the optional packages Slurm and OpenPBS. These optional packages can coexist on a scheduler server, which may or may not be a ClusterWare head node. However, if job schedulers are installed on the same server, then only one at a time should be enabled and executing on that given server.

All nodes in the job scheduler cluster must be able to resolve hostnames of all other nodes as well as the scheduler server hostname. The ClusterWare platform provides a DNS server in the clusterware-dnsmasq package, as discussed in Node Name Resolution. This dnsmasq will resolve all compute node hostnames, and the job scheduler's hostname should be added to /etc/hosts on the head node(s) in order to be resolved by dnsmasq. Whenever /etc/hosts is edited, please restart the clusterware-dnsmasq service with:

sudo systemctl restart clusterware-dnsmasq

Installing and configuring a job scheduler requires making changes to the compute node software. When using image-based compute nodes, we suggest first cloning the DefaultImage or creating a new image, leaving untouched the DefaultImage as a basic known-functional pristine image.

For example, to set up nodes n0 through n3, you might first do:

scyld-imgctl -i DefaultImage clone name=jobschedImage
scyld-bootctl -i DefaultBoot clone name=jobschedBoot image=jobschedImage
scyld-nodectl -i n[0-3] set _boot_config=jobschedBoot

When these nodes reboot after all the setup steps are complete, they will use the jobschedBoot and jobschedImage.

See https://slurm.schedmd.com/rosetta.pdf for a discussion of the differences between PBS TORQUE and Slurm. See https://slurm.schedmd.com/faq.html#torque for useful information about how to transition from OpenPBS or PBS TORQUE to Slurm.

The following sections describe the installation and configuration of each job scheduler type.

Slurm#

See Job Schedulers for general job scheduler information and configuration guidelines. See https://slurm.schedmd.com for Slurm documentation.

Note

As of Clusterware 12, the default slurm-scyld configuration is Configless. This reduces the admin effort needed when updating the list of compute nodes. See https://slurm.schedmd.com/configless_slurm.html for more information.

Install Slurm#

First, install Slurm software on the job scheduler server.

For RHEL/Rocky 8:

sudo yum install slurm-scyld --enablerepo=scyld* --enablerepo=powertools

For RHEL/Rocky 9:

sudo yum install slurm-scyld --enablerepo=scyld* --enablerepo=crb

For all other systems:

sudo yum install slurm-scyld --enablerepo=scyld*

Note

An additional RPM package, slurm-scyld-slurmrestd, is available. See https://slurm.schedmd.com/slurmrestd.html for details. The slurm-scyld-slurmrestd package is not installed by default. To install the package, run yum --enablerepo=scyld* install slurm-scyld-slurmrestd.

Next, use a helper script slurm-scyld.setup to complete the initialization and setup the job scheduler and the compute node image(s).

slurm-scyld.setup init

The slurm-scyld.setup script performs the init, reconfigure, and update-nodes actions (described below) by default against all up nodes. Those actions optionally accept a node-specific argument using the syntax [--ids|-i <NODES>] or a group-specific argument using [--ids|-i %<GROUP>]. See Attribute Groups and Dynamic Groups for details.

init first generates /etc/slurm/slurm.conf by trying to install slurm-scyld-node and run slurmd -C on 'up' nodes. By default configless slurm is enabled by "SlurmctldParameters=enable_configless" in /etc/slurm/slurm.conf, and a DNS SRV record called slurmctld_primary is created. To see the details about the SRV, run: scyld-clusterctl hosts -i slurmctld_primary ls -l.

Note

For clusters with a backup Slurm controller, create a slurmctld_backup DNS SRV record:

scyld-clusterctl --hidden hosts create name=slurmctd_backup port=6817 \
      service=slurmctld domain=cluster.local target=backuphostname \
      type=srvrec priority=20

However if there are no 'up' nodes or slurm-scyld-node installation fails for some reason, then no node is configured in slurm.conf during init. Later you can use reconfigure to create a new slurm.conf or update-node to update the nodes in an existing slurm.conf. init also generates /etc/slurm/cgroup.conf and /etc/slurm/slurmdbd.conf, starts munge, slurmctld, mariadb, slurmdbd, and restarts slurmctld. At last, init tries to start slurmd on nodes. In an ideal case if the script succeeds to install slurm-scyld-node on compute nodes, srun -N 1 hostname works after init.

The slurmd installation and configuration on 'up' nodes do not survive after nodes reboot, unless on diskful compute nodes. To make a persistent slurm image:

slurm-scyld.setup update-image slurmImage

By default update-image does not include slurm config files into slurmImage if configless is enabled, otherwise includes config files into slurmImage. You can overwrite this default behavior by appending an additional arg "--copy-configs" or "--remove-configs" after slurmImage as in above command.

Reboot the compute nodes to bring them into active management by Slurm. Check the Slurm status:

slurm-scyld.setup status

If any services on controller (slurmctld, slurmdbd and munge) or compute nodes (slurmd and munge) are not running, you can try to use systemctl to start individual service, or use slurm-scyld.setup cluster-restart, slurm-scyld.setup restart and slurm-scyld.setup start-nodes to restart slurm cluster-wide, controller only and nodes only.

Note

The above restart or start do not effect slurmImage.

The update-image is necessary for persistence across compute node reboots.

Working with Slurm#

Generate new slurm-specific config files with:

slurm-scyld.setup reconfigure      # default to all 'up' nodes

Add nodes by executing:

slurm-scyld.setup update-nodes     # default to all 'up' nodes

or add or remove nodes by directly editing the /etc/slurm/slurm.conf config file.

Note

With Configless Slurm, the slurmImage does NOT need to be reconfigured after new nodes are added -- Slurm will automatically forward the new information to the slurmd daemons on the nodes.

Inject users into the compute node image using the sync-uids script. The administrator can inject all users, or a selected list of users, or a single user. For example, inject the single user janedoe:

/opt/scyld/clusterware-tools/bin/sync-uids \
              -i slurmImage --create-homes \
              --users janedoe --sync-key janedoe=/home/janedoe/.ssh/id_rsa.pub

See Configure Administrator Authentication and /opt/scyld/clusterware-tools/bin/sync-uids -h for details.

To view the Slurm status on the server and compute nodes:

slurm-scyld.setup status

The Slurm service can also be started and stopped cluster-wide with:

slurm-scyld.setup cluster-stop
slurm-scyld.setup cluster-start

Slurm executable commands and libraries are installed in /opt/scyld/slurm/. The Slurm controller configuration can be found in /etc/slurm/slurm.conf, and each node caches a copy of that slurm.conf file in /var/spool/slurmd/conf-cache/. Each Slurm user must set up the PATH and LD_LIBRARY_PATH environment variables to properly access the Slurm commands. This is done automatically for users who login when Slurm is running via the /etc/profile.d/scyld.slurm.sh script. Alternatively, each Slurm user can manually execute module load slurm or can add that command line to (for example) the user's ~/.bash_profile or ~/.bashrc.

For a traditional config-file-based Slurm deployment, the admin will have to push the new /etc/slurm/slurm.conf file out to the compute nodes and then restart slurmd. Alternately, the admin can modify the boot image to include the new config file, and then reboot the nodes into that new image.

OpenPBS#

OpenPBS is only available for RHEL/CentOS 8 clusters.

See Job Schedulers for general job scheduler information and configuration guidelines. See https://www.openpbs.org for OpenPBS documentation.

First install OpenPBS software on the job scheduler server:

sudo yum install openpbs-scyld --enablerepo=scyld*

Use a helper script to complete the initialization and setup the job scheduler and config file in the compute node image(s).

Note

The openpbs-scyld.setup script performs the init, reconfigure, and update-nodes actions (described below) by default against all up nodes. Those actions optionally accept a node-specific argument using the syntax [--ids|-i <NODES>] or a group-specific argument using [--ids|-i %<GROUP>]. See Attribute Groups and Dynamic Groups for details.

openpbs-scyld.setup init                      # default to all 'up' nodes
openpbs-scyld.setup update-image openpbsImage # for permanence in the image

Reboot the compute nodes to bring them into active management by OpenPBS. Check the OpenPBS status:

openpbs-scyld.setup status

# If the OpenPBS daemon is not executing, then:
openpbs-scyld.setup cluster-restart

# And check the status again

This cluster-restart is a manual one-time setup that doesn't affect the openpbsImage. The update-image is necessary for persistence across compute node reboots.

Generate new openpbs-specific config files with:

openpbs-scyld.setup reconfigure      # default to all 'up' nodes

Add nodes by executing:

openpbs-scyld.setup update-nodes     # default to all 'up' nodes

or add or remove nodes by executing qmgr.

Any such changes must be added to openpbsImage by reexecuting:

openpbs-scyld.setup update-image openpbsImage

and then either reboot all the compute nodes with that updated image, or additional execute:

openpbs-scyld.setup cluster-restart

to manually push the changes to the up nodes without requiring a reboot.

Inject users into the compute node image using the sync-uids script. The administrator can inject all users, or a selected list of users, or a single user. For example, inject the single user janedoe:

/opt/scyld/clusterware-tools/bin/sync-uids \
              -i openpbsImage --create-homes \
              --users janedoe --sync-key janedoe=/home/janedoe/.ssh/id_rsa.pub

See Configure Administrator Authentication and /opt/scyld/clusterware-tools/bin/sync-uids -h for details.

To view the OpenPBS status on the server and compute nodes:

openpbs-scyld.setup status

The OpenPBS service can also be started and stopped cluster-wide with:

openpbs-scyld.setup cluster-stop
openpbs-scyld.setup cluster-start

OpenPBS executable commands and libraries are installed in /opt/scyld/openpbs/. Each OpenPBS user must set up the PATH and LD_LIBRARY_PATH environment variables to properly access the OpenPBS commands. This is done automatically for users who login when OpenPBS is running via the /etc/profile.d/scyld.openpbs.sh script. Alternatively, each OpenPBS user can manually execute module load openpbs or can add that command line to (for example) the user's ~/.bash_profile or ~/.bashrc.

PBS TORQUE#

PBS TORQUE is only available for RHEL/CentOS 7 clusters. See Job Schedulers for general job scheduler information and configuration guidelines. See https://www.adaptivecomputing.com/support/documentation-index/torque-resource-manager-documentation for PBS TORQUE documentation.

First install PBS TORQUE software on the job scheduler server:

sudo yum install torque-scyld --enablerepo=scyld*

Now use a helper script torque-scyld.setup to complete the initialization and setup the job scheduler and config file in the compute node image(s).

Note

The torque-scyld.setup script performs the init, reconfigure, and update-nodes actions (described below) by default against all up nodes. Those actions optionally accept a node-specific argument using the syntax [--ids|-i <NODES>] or a group-specific argument using [--ids|-i %<GROUP>]. See Attribute Groups and Dynamic Groups for details.

torque-scyld.setup init                       # default to all 'up' nodes
torque-scyld.setup update-image torqueImage   # for permanence in the image

Reboot the compute nodes to bring them into active management by TORQUE. Check the TORQUE status:

torque-scyld.setup status

# If the TORQUE daemon is not executing, then:
torque-scyld.setup cluster-restart

# And check the status again

This cluster-restart is a manual one-time setup that doesn't affect the torqueImage. The update-image is necessary for persistence across compute node reboots.

Generate new torque-specific config files with:

torque-scyld.setup reconfigure      # default to all 'up' nodes

Add nodes by executing:

torque-scyld.setup update-nodes     # default to all 'up' nodes

or add or remove nodes by directly editing the /var/spool/torque/server_priv/nodes config file. Any such changes must be added to torqueImage by reexecuting:

torque-scyld.setup update-image slurmImage

and then either reboot all the compute nodes with that updated image, or additional execute:

torque-scyld.setup cluster-restart

to manually push the changes to the up nodes without requiring a reboot.

Inject users into the compute node image using the sync-uids script. The administrator can inject all users, or a selected list of users, or a single user. For example, inject the single user janedoe:

/opt/scyld/clusterware-tools/bin/sync-uids \
              -i torqueImage --create-homes \
              --users janedoe --sync-key janedoe=/home/janedoe/.ssh/id_rsa.pub

See Configure Administrator Authentication and /opt/scyld/clusterware-tools/bin/sync-uids -h for details.

To view the TORQUE status on the server and compute nodes:

torque-scyld.setup status

The TORQUE service can also be started and stopped cluster-wide with:

torque-scyld.setup cluster-stop
torque-scyld.setup cluster-start

TORQUE executable commands are installed in /usr/sbin/ and /usr/bin/, TORQUE libraries are installed in /usr/lib64/, and are therefore accessible by the default search rules.