Job Schedulers#

The default ICE ClusterWare ™ installation for RHEL/CentOS includes support for the optional packages Slurm and OpenPBS (RHEL/CentOS 8 only). These optional packages can coexist on a scheduler server, which may or may not be a ClusterWare head node. However, if job schedulers are installed on the same server, then only one at a time should be enabled and executing on that given server.

Note

The ClusterWare software no longer ships with PBS TORQUE. However, PBS TORQUE is available in older ClusterWare EL7 packages, which are no longer updated. While PBS TORQUE is no longer tested, you should be able to run it with your ClusterWare cluster.

Prerequisites#

Complete the following steps before installing and configuring a job scheduler to work with the ClusterWare platform.

Resolve Job Scheduler Hostname#

All nodes in the job scheduler cluster must be able to resolve hostnames of all other nodes as well as the scheduler server hostname. The ClusterWare platform provides a DNS server in the clusterware-dnsmasq package, as discussed in Node Name Resolution. This dnsmasq resolves all compute node hostnames.

  1. Add the job scheduler's hostname to /etc/hosts on the head node(s) to be resolved by dnsmasq.

  2. Restart the clusterware-dnsmasq service after editing /etc/hosts by running:

    sudo systemctl restart clusterware-dnsmasq
    

Create Job Scheduler Image#

Installing and configuring a job scheduler requires making changes to the compute node software. When using image-based compute nodes, clone the DefaultImage to create a new image, leaving untouched the DefaultImage as a basic known-functional pristine image.

For example, to set up nodes n0 through n3:

  1. Clone the default image:

    cw-imgctl -i DefaultImage clone name=jobschedImage
    
  2. Clone the default boot configuration and add the new image to the new boot configuration:

    cw-bootctl -i DefaultBoot clone name=jobschedBoot image=jobschedImage
    
  3. Set the boot configuration with the new image on nodes n0-n3:

    cw-nodectl -i n[0-3] set _boot_config=jobschedBoot
    

When these nodes reboot after all the setup steps are complete, they will use the jobschedBoot and jobschedImage.

Additional Scheduler Resources#

See https://slurm.schedmd.com/rosetta.pdf for a discussion of the differences between PBS TORQUE and Slurm. See https://slurm.schedmd.com/faq.html#torque for useful information about how to transition from OpenPBS or PBS TORQUE to Slurm.

The following sections describe the installation and configuration of each job scheduler type.

Slurm#

The default ClusterWare Slurm configuration is configless and uses dynamic Slurm nodes. This reduces the admin effort needed when updating the list of compute nodes. See https://slurm.schedmd.com/configless_slurm.html and https://slurm.schedmd.com/dynamic_nodes.html for more information.

Alternatively, you can also configure ClusterWare and Slurm to use static nodes or a combination of static and dynamic nodes.

  • Dynamic Slurm Nodes (default): When new nodes are added to a ClusterWare cluster and booted with a Slurm image, they are automatically added to Slurm. Dynamic nodes are not automatically removed from Slurm scontrol, even if the node is removed or changed within the ClusterWare platform.

  • Static Slurm Nodes: Static nodes need to be manually configured to be added to Slurm. Static Slurm nodes were the default prior to the ClusterWare 13.0 release.

  • Mix of Dynamic and Static Slurm Nodes: You can use a mix of dynamic and static nodes. Dynamic and static nodes can use the same Slurm image.

Configless Slurm is enabled with "SlurmctldParameters=enable_configless" in /etc/slurm/slurm.conf and a DNS SRV record called slurmctld_primary is created. To see the details about the SRV record, run:

cw-clusterctl hosts -i slurmctld_primary ls -l

For clusters with a backup Slurm controller, create a slurmctld_backup DNS SRV record:

cw-clusterctl --hidden hosts create name=slurmctld_backup port=6817 \
    service=slurmctld domain=cluster.local target=backuphostname \
      type=srvrec priority=20

Install Slurm#

  1. Complete the job scheduler configuration prerequisites.

  2. Install Slurm software on the job scheduler controller.

    • For RHEL/CentOS 8:

      sudo yum install slurm-cw --enablerepo=cw* --enablerepo=cw* --enablerepo=powertools
      
    • For RHEL/CentOS 9:

      sudo yum install slurm-cw --enablerepo=cw* --enablerepo=cw* --enablerepo=crb
      

    Note

    An additional RPM package, slurm-cw-slurmrestd, is available. See https://slurm.schedmd.com/slurmrestd.html for details. The slurm-cw-slurmrestd package is not installed by default. To install the package, run yum --enablerepo=cw* --enablerepo=cw* install slurm-cw-slurmrestd.

  3. Configure either dynamic (default) or static Slurm nodes.

Configure with Dynamic Slurm Nodes#

ClusterWare with Slurm uses dynamic Slurm nodes by default.

  1. Use a helper script slurm-cw.setup to complete the initialization and install the Slurm RPMs on the controller. You must have ClusterWare administrator permissions to run this command.

    slurm-cw.setup init
    

    init generates /etc/slurm/slurm.conf, /etc/slurm/cgroup.conf, and /etc/slurm/slurmdbd.conf, starts munge, slurmctld, mariadb, and slurmdbd, and restarts slurmctld.

  2. For diskless nodes only: Set up the boot configuration and Slurm image and apply them to the compute nodes.

    1. Update the image you created during the prerequisite steps to include Slurm installation and configuration details:

      slurm-cw.setup update-image <slurm image>
      

      Where <slurm image> is replaced by the name of the image file you created during the prerequisite steps.

    2. Reboot the compute notes for the image changes to take effect:

      cw-nodectl -i <node list> reboot
      

      Where <node list> is replaced by a list of nodes. For example, n[0-3,14,17-22].

      After reboot, the nodes with the Slurm Image applied automatically join as dynamic Slurm nodes.

  3. For diskful nodes only: Install Slurm on the nodes and reboot. The nodes will automatically join as dynamic Slurm nodes. For example, one option is to add the nodes as static nodes and then remove them from the slurm.conf file after initialization.

  4. Check the Slurm status to ensure all expected nodes are listed:

    slurm-cw.setup status
    

To avoid adding nodes as a dynamic Slurm node, set the _slurmd=NoDynamic reserved attribute. Setting the _slurmd reserved attribute does not impact static Slurm nodes. For example, to set on node n1:

cw-nodectl -i n1 set _slurmd=NoDynamic

Configure with Static Slurm Nodes#

Unlike with dynamic Slurm nodes, static Slurm nodes need to be added to Slurm explicitly.

  1. Use a helper script slurm-cw.setup to complete the initialization, install the Slurm RPMs on the controller, and run slurmd on specified nodes. You must have ClusterWare administrator permissions to run this command.

    slurm-cw.setup init <nodes>
    

    Where <nodes> is replaced by:

    • All “up” nodes: --up

    • A list of nodes: -i n[0-2]

    • An expression attribute: -s 'attributes[_boot_config]=="DefaultBoot"'

    init generates /etc/slurm/slurm.conf, /etc/slurm/cgroup.conf, and /etc/slurm/slurmdbd.conf, starts munge, slurmctld, mariadb, and slurmdbd, and restarts slurmctld. Next, init tries to install slurm-cw-node on the selected live nodes. After that installation succeeds, the slurm-cw.setup script starts slurmd on the selected live nodes and those nodes are added to /etc/slurm/slurm.conf as static Slurm nodes.

  2. For diskless nodes only: Reboot the nodes with the Slurm image.

    Note

    These steps are not required for diskful nodes as Slurm is installed directly on the disk via slurm-cw.setup init.

    1. Update the image you created during the prerequisite steps to include Slurm installation and configuration details:

      slurm-cw.setup update-image <slurm image>
      

      Where <slurm image> is replaced by the name of the image file you created during the prerequisite steps.

    2. Reboot the compute notes for the image changes to take effect:

      cw-nodectl -i <node list> reboot
      

      Where <node list> is replaced by a list of nodes. For example, n[0-3,14,17-22].

  3. Check the Slurm status to ensure all expected nodes are listed:

    slurm-cw.setup status
    

Working with Slurm#

When a node boots, the ClusterWare script boots nodes configured in slurm.conf statically and those not configured in slurm.conf dynamically. If, however, the _slurmd reserved attribute is set to NoDynamic, ClusterWare will not attempt to boot the node as a dynamic Slurm node. Setting the _slurmd reserved attribute to NoDynamic has no impact on static Slurm nodes.

You can view the Slurm status on the server and compute nodes by running:

slurm-cw.setup status

Start and stop the Slurm service cluster-wide by running:

slurm-cw.setup cluster-stop
slurm-cw.setup cluster-start

Slurm User Access#

Slurm executable commands and libraries are installed in /opt/scyld/slurm/. The Slurm controller configuration can be found in /etc/slurm/slurm.conf and each configless node caches a copy of that slurm.conf file in /var/spool/slurmd/conf-cache/.

You can inject users into the compute node image using the sync-uids script. You can inject all users, a selected list of users, or a single user. For example, inject the single user janedoe:

/opt/scyld/clusterware-tools/bin/sync-uids \
              -i slurmImage --create-homes \
              --users janedoe --sync-key janedoe=/home/janedoe/.ssh/id_rsa.pub

See Configure Administrator Authentication and /opt/scyld/clusterware-tools/bin/sync-uids -h for details.

Each Slurm user must set up the PATH and LD_LIBRARY_PATH environment variables to properly access the Slurm commands. This is done automatically for users who log in when Slurm is running via the /etc/profile.d/cw.slurm.sh script. Alternatively, each Slurm user can manually execute module load Slurm or can add that command line to (for example) the user's ~/.bash_profile or ~/.bashrc.

Add Slurm Nodes#

Dynamic Slurm nodes are automatically added after rebooting with the appropriate boot configuration and Slurm image (diskless nodes) or after Slurm is installed (diskful nodes).

For static Slurm nodes, you can add nodes by:

  • Running the following command:

    slurm-cw.setup update-nodes <nodes>
    

    Where <nodes> is replaced by:

    • All “up” nodes: --up

    • A list of nodes: -i n[0-2]

    • An expression attribute: -s 'attributes[_boot_config]=="DefaultBoot"'

  • Directly editing the /etc/slurm/slurm.conf config file.

Note

With configless Slurm, the Slurm image does NOT need to be reconfigured after new static nodes are added. Slurm automatically forwards the new information to the slurmd daemons on the nodes.

Remove Dynamic Slurm Nodes#

After a dynamic node is added to Slurm (that is, scontrol remembers the node name), the node is not automatically removed from scontrol even if the node is removed or changed within the ClusterWare software. If you want to remove a dynamic node from scontrol, use the delete argument within scontrol. For example, to remove node n0:

scontrol delete node n0.cluster.local

Work with Slurm Configuration File#

After initialization, you can manually edit the Slurm configuration file /etc/slurm/slurm.conf to add or remove static Slurm nodes.

You can also use the slurm.conf file to add or remove partitions to set up alternative queues for nodes.

You can generate a new Slurm configuration file for specified nodes without reconfiguring the database or controller. Generating a new configuration file is not common as it resets the Slurm configuration.

slurm-cw.setup reconfigure  <nodes>

Where <nodes> is replaced by:

  • All “up” nodes: --up

  • A list of nodes: -i n[0-2]

  • An expression attribute: -s 'attributes[_boot_config]=="DefaultBoot"'

Troubleshooting Slurm#

If any services on the controller (slurmctld, slurmdbd, and munge) or on the compute nodes (slurmd and munge) are not running, you can use systemctl to start the individual service. Alternatively, use the following commands:

To restart Slurm cluster-wide: slurm-cw.setup cluster-restart

To restart Slurm on the controller: slurm-cw.setup restart

To restart Slurm on nodes: slurm-cw.setup restart-nodes

Note

Starting or restarting does not affect the Slurm image.

See Workload Management for information about monitoring and troubleshooting schedulers.

OpenPBS#

OpenPBS is only available for RHEL/CentOS 8 clusters.

See Job Schedulers for general job scheduler information and configuration guidelines. See https://www.openpbs.org for OpenPBS documentation.

First install OpenPBS software on the job scheduler server:

sudo yum install openpbs-scyld --enablerepo=cw* --enablerepo=scyld*

Use a helper script to complete the initialization and setup the job scheduler and config file in the compute node image(s).

Note

The openpbs-scyld.setup script performs the init, reconfigure, and update-nodes actions (described below) by default against all up nodes. Those actions optionally accept a node-specific argument using the syntax [--ids|-i <NODES>] or a group-specific argument using [--ids|-i %<GROUP>]. See Attribute Groups for details.

openpbs-scyld.setup init                      # default to all 'up' nodes
openpbs-scyld.setup update-image openpbsImage # for permanence in the image

Reboot the compute nodes to bring them into active management by OpenPBS. Check the OpenPBS status:

openpbs-scyld.setup status

# If the OpenPBS daemon is not executing, then:
openpbs-scyld.setup cluster-restart

# And check the status again

This cluster-restart is a manual one-time setup that doesn't affect the openpbsImage. The update-image is necessary for persistence across compute node reboots.

Generate new openpbs-specific config files with:

openpbs-scyld.setup reconfigure      # default to all 'up' nodes

Add nodes by executing:

openpbs-scyld.setup update-nodes     # default to all 'up' nodes

or add or remove nodes by executing qmgr.

Any such changes must be added to openpbsImage by reexecuting:

openpbs-scyld.setup update-image openpbsImage

and then either reboot all the compute nodes with that updated image, or additional execute:

openpbs-scyld.setup cluster-restart

to manually push the changes to the up nodes without requiring a reboot.

Inject users into the compute node image using the sync-uids script. The administrator can inject all users, or a selected list of users, or a single user. For example, inject the single user janedoe:

/opt/scyld/clusterware-tools/bin/sync-uids \
              -i openpbsImage --create-homes \
              --users janedoe --sync-key janedoe=/home/janedoe/.ssh/id_rsa.pub

See Configure Administrator Authentication and /opt/scyld/clusterware-tools/bin/sync-uids -h for details.

To view the OpenPBS status on the server and compute nodes:

openpbs-scyld.setup status

The OpenPBS service can also be started and stopped cluster-wide with:

openpbs-scyld.setup cluster-stop
openpbs-scyld.setup cluster-start

OpenPBS executable commands and libraries are installed in /opt/scyld/openpbs/. Each OpenPBS user must set up the PATH and LD_LIBRARY_PATH environment variables to properly access the OpenPBS commands. This is done automatically for users who login when OpenPBS is running via the /etc/profile.d/scyld.openpbs.sh script. Alternatively, each OpenPBS user can manually execute module load openpbs or can add that command line to (for example) the user's ~/.bash_profile or ~/.bashrc.