Configure Job Schedulers#
The ClusterWareAI ™ software supports integration with the Slurm and OpenPBS job schedulers. Job schedulers are used to manage and run workloads requested by multiple users. The job scheduler grants each user a portion of the cluster resources for a given time to run their job.
Slurm is designed for batch workloads, such as AI training, scientific simulations, or data analysis tasks. Slurm is fast and easy to use when a workload has a defined ending and when you know the scope of resources required for the job. Slurm can also plan for future resource needs and schedule jobs based on availability in the future, which optimizes cluster usage over time.
The ClusterWareAI software also supports integration with Kubernetes. Kubernetes excels at scaling applications to handle high traffic or large datasets for workloads that run continuously. Kubernetes could be a good choice for AI inference, for example, whereas Slurm could be a good choice for AI training. See Configure Kubernetes for details.
The default ClusterWareAI installation for RHEL/CentOS includes the Slurm and OpenPBS (RHEL/CentOS 8 only) packages. It is recommended to install these optional packages on a separate administrative node or scheduler server from the ClusterWareAI head node. While the two schedulers can coexist on the same server, only one can be enabled and executing on the server at any time.
Note
The ClusterWareAI software no longer ships with PBS TORQUE. However, PBS TORQUE is available in older ClusterWareAI EL7 packages, which are no longer updated. While PBS TORQUE is no longer tested, you should be able to run it with your ClusterWareAI cluster.
Job Scheduler Configuration Prerequisites#
Complete the following steps before installing and configuring Slurm or OpenPBS to work with the ClusterWareAI platform.
Resolve Job Scheduler Hostname#
All nodes in the job scheduler cluster must be able to resolve hostnames of all other nodes as well as the scheduler server hostname. The ClusterWareAI platform provides a DNS server in the clusterware-dnsmasq package, as discussed in Node Name Resolution. This dnsmasq resolves all compute node hostnames.
Add the job scheduler's hostname to
/etc/hostson the head node(s) to be resolved by dnsmasq.Restart the clusterware-dnsmasq service after editing
/etc/hostsby running:sudo systemctl restart clusterware-dnsmasq
Create Job Scheduler Image#
Installing and configuring a job scheduler requires making changes to the compute node software. When using image-based compute nodes, clone the DefaultImage to create a new image, leaving untouched the DefaultImage as a basic known-functional pristine image.
For example, to set up nodes n0 through n3:
Clone the default image:
cw-imgctl -i DefaultImage clone name=jobschedImage
Clone the default boot configuration and add the new image to the new boot configuration:
cw-bootctl -i DefaultBoot clone name=jobschedBoot image=jobschedImage
Set the boot configuration with the new image on nodes n0-n3:
cw-nodectl -i n[0-3] set _boot_config=jobschedBoot
When these nodes reboot after all the Slurm- or OpenPBS-specific setup steps are complete, they will use the jobschedBoot and jobschedImage.
The following sections describe the installation and configuration of each job scheduler type.
Configure Slurm#
Tip
See https://slurm.schedmd.com/faq.html#torque for useful information about how to transition from OpenPBS or PBS TORQUE to Slurm.
The default ClusterWareAI Slurm configuration is configless and uses dynamic Slurm nodes. This reduces the admin effort needed when updating the list of compute nodes. See https://slurm.schedmd.com/configless_slurm.html and https://slurm.schedmd.com/dynamic_nodes.html for more information.
Alternatively, you can also configure ClusterWareAI and Slurm to use static nodes or a combination of static and dynamic nodes.
Dynamic Slurm Nodes (default): When new nodes are added to a ClusterWareAI cluster and booted with a Slurm image, they are automatically added to Slurm. Dynamic nodes are not automatically removed from Slurm scontrol, even if the node is removed or changed within the ClusterWareAI platform.
Static Slurm Nodes: Static nodes need to be manually configured to be added to Slurm. Static Slurm nodes were the default prior to the ClusterWareAI 13.0 release.
Mix of Dynamic and Static Slurm Nodes: You can use a mix of dynamic and static nodes. Dynamic and static nodes can use the same Slurm image.
Configless Slurm is enabled with "SlurmctldParameters=enable_configless" in
/etc/slurm/slurm.conf and a DNS SRV record called slurmctld_primary is
created. To see the details about the SRV record, run:
cw-clusterctl hosts -i slurmctld_primary ls -l
For clusters with a backup Slurm controller, create a slurmctld_backup DNS
SRV record:
cw-clusterctl --hidden hosts create name=slurmctld_backup port=6817 \
service=slurmctld domain=cluster.local target=backuphostname \
type=srvrec priority=20
Install Slurm#
Complete the job scheduler configuration prerequisites.
Install Slurm software on the job scheduler controller.
For RHEL/CentOS 8:
sudo dnf install slurm-cw --enablerepo=cw* --enablerepo=cw* --enablerepo=powertools
For RHEL/CentOS 9 and 10:
sudo dnf install slurm-cw --enablerepo=cw* --enablerepo=cw* --enablerepo=crb
Note
An additional RPM package,
slurm-cw-slurmrestd, is available. See https://slurm.schedmd.com/slurmrestd.html for details. Theslurm-cw-slurmrestdpackage is not installed by default. To install the package, rundnf --enablerepo=cw* --enablerepo=cw* install slurm-cw-slurmrestd.
Configure with Dynamic Slurm Nodes#
ClusterWareAI with Slurm uses dynamic Slurm nodes by default.
Use a helper script
slurm-cw.setupto complete the initialization and install the Slurm RPMs on the controller. You must have ClusterWareAI administrator permissions to run this command.slurm-cw.setup init
initgenerates/etc/slurm/slurm.conf,/etc/slurm/cgroup.conf, and/etc/slurm/slurmdbd.conf, starts munge, slurmctld, mariadb, and slurmdbd, and restarts slurmctld.For diskless nodes only: Set up the boot configuration and Slurm image and apply them to the compute nodes.
Update the image you created during the prerequisite steps to include Slurm installation and configuration details:
slurm-cw.setup update-image <slurm image>
Where
<slurm image>is replaced by the name of the image file you created during the prerequisite steps.Reboot the compute notes for the image changes to take effect:
cw-nodectl -i <node list> reboot
Where
<node list>is replaced by a list of nodes. For example, n[0-3,14,17-22].After reboot, the nodes with the Slurm Image applied automatically join as dynamic Slurm nodes.
For diskful nodes only: Install Slurm on the nodes and reboot. The nodes will automatically join as dynamic Slurm nodes. For example, one option is to add the nodes as static nodes and then remove them from the
slurm.conffile after initialization.Check the Slurm status to ensure all expected nodes are listed:
slurm-cw.setup status
To avoid adding nodes as a dynamic Slurm node, set the _slurmd=NoDynamic
reserved attribute. Setting the _slurmd reserved attribute does not impact
static Slurm nodes. For example, to set on node n1:
cw-nodectl -i n1 set _slurmd=NoDynamic
Configure with Static Slurm Nodes#
Unlike with dynamic Slurm nodes, static Slurm nodes need to be added to Slurm explicitly.
Use a helper script
slurm-cw.setupto complete the initialization, install the Slurm RPMs on the controller, and run slurmd on specified nodes. You must have ClusterWareAI administrator permissions to run this command.slurm-cw.setup init <nodes>
Where
<nodes>is replaced by:All “up” nodes:
--upA list of nodes:
-i n[0-2]An expression attribute:
-s 'attributes[_boot_config]=="DefaultBoot"'
initgenerates/etc/slurm/slurm.conf,/etc/slurm/cgroup.conf, and/etc/slurm/slurmdbd.conf, starts munge, slurmctld, mariadb, and slurmdbd, and restarts slurmctld. Next,inittries to install slurm-cw-node on the selected live nodes. After that installation succeeds, the slurm-cw.setup script starts slurmd on the selected live nodes and those nodes are added to/etc/slurm/slurm.confas static Slurm nodes.For diskless nodes only: Reboot the nodes with the Slurm image.
Note
These steps are not required for diskful nodes as Slurm is installed directly on the disk via
slurm-cw.setup init.Update the image you created during the prerequisite steps to include Slurm installation and configuration details:
slurm-cw.setup update-image <slurm image>
Where
<slurm image>is replaced by the name of the image file you created during the prerequisite steps.Reboot the compute notes for the image changes to take effect:
cw-nodectl -i <node list> reboot
Where
<node list>is replaced by a list of nodes. For example, n[0-3,14,17-22].
Check the Slurm status to ensure all expected nodes are listed:
slurm-cw.setup status
Configure Auto Remediation Service (ARS) with Slurm#
If you are enabling ARS, complete the following steps:
Copy valid
slurm.confandmunge.keyfiles from the Slurm scheduler controller to all head nodes as/etc/slurm/slurm.confand/etc/munge/munge.key.Open the firewall for each head node IP address on the Slurm scheduler controllers (primary and secondary).
Find all slurmctld hosts hostnames defined in the
SlurmctldHost=lines of/etc/slurm/slurm.conf.Add the slurmctld hosts IP addresses and hostnames to
/etc/hostson all head nodes using the following format:<IP Address> <hostname>
For example:
10.110.1.1 slurmcontrol-primary 10.110.1.2 slurmcontrol-secondary
Modify the ARS state machine container quadlet on all head nodes.
On a RHEL or Rocky 9 head node, create the drop-in
slurm.conffile for the ARS state machine container quadlet on all head nodes:sudo cp /etc/containers/systemd/ars-state-machine.container.d/slurm.conf.example \ /etc/containers/systemd/ars-state-machine.container.d/slurm.confOn a RHEL or Rocky 8 head node:
Copy the following lines from the
/etc/containers/systemd/ars-state-machine.container.d/slurm.conf.examplefile:Volume=/etc/munge/munge.key:/run/secrets/munge.key:ro,z Volume=/etc/slurm/slurm.conf:/run/secrets/slurm.conf:ro,z
Add the copied lines to the
/etc/containers/systemd/ars-state-machine.containerfile. For example:[Unit] Description=ARS State Machine [Container] Image=cw-embedded-registry.internal/ars-state-machine Volume=/opt/scyld/clusterware/workspace/sys-ars-settings.ini:/root/.scyldcw/settings.ini:ro,z Volume=/etc/munge/munge.key:/run/secrets/munge.key:ro,z Volume=/etc/slurm/slurm.conf:/run/secrets/slurm.conf:ro,z Pull=missing [Service] Restart=always RestartSec=30s [Install] WantedBy=multi-user.target
Reload the daemon and restart the ARS state machine service:
sudo systemctl daemon-reload sudo systemctl restart ars-state-machine
Follow the steps in Add Workload Scheduler to ARS-monitored Compute Nodes to complete Slurm configuration with ARS.
Note
If you stop using Slurm, remove the drop-in slurm.conf file or the
inserted lines from the ARS state machine quadlet container on all head nodes,
then reload and restart the ARS state machine service by running:
sudo rm -f /etc/containers/systemd/ars-state-machine.container.d/slurm.conf
sudo systemctl daemon-reload
sudo systemctl restart ars-state-machine
Work with Slurm#
When a node boots, the ClusterWareAI script boots nodes configured in slurm.conf
statically and those not configured in slurm.conf dynamically. If, however, the
_slurmd reserved attribute is set to NoDynamic, ClusterWareAI will not
attempt to boot the node as a dynamic Slurm node. Setting the _slurmd
reserved attribute to NoDynamic has no impact on static Slurm nodes.
You can view the Slurm status on the server and compute nodes by running:
slurm-cw.setup status
Start and stop the Slurm service cluster-wide by running:
slurm-cw.setup cluster-stop
slurm-cw.setup cluster-start
Slurm User Access#
Slurm executable commands and libraries are installed in /opt/scyld/slurm/.
The Slurm controller configuration can be found in /etc/slurm/slurm.conf
and each configless node caches a copy of that slurm.conf file in
/var/spool/slurmd/conf-cache/.
You can inject users into the compute node image using the sync-uids script.
You can inject all users, a selected list of users, or a single user.
For example, inject the single user janedoe:
/opt/scyld/clusterware-tools/bin/sync-uids \
-i slurmImage --create-homes \
--users janedoe --sync-key janedoe=/home/janedoe/.ssh/id_rsa.pub
See Configure Administrator Authentication and
/opt/scyld/clusterware-tools/bin/sync-uids -h for details.
Each Slurm user must set up the PATH and LD_LIBRARY_PATH environment variables
to properly access the Slurm commands. This is done automatically for users who
log in when Slurm is running via the /etc/profile.d/cw.slurm.sh script.
Alternatively, each Slurm user can manually execute module load Slurm or can add
that command line to (for example) the user's ~/.bash_profile or
~/.bashrc.
Work with Slurm Configuration File#
After initialization, you can manually edit the Slurm configuration file
/etc/slurm/slurm.conf to add or remove static Slurm nodes.
You can also use the slurm.conf file to add or remove partitions to set up alternative queues for nodes.
You can generate a new Slurm configuration file for specified nodes without reconfiguring the database or controller. Generating a new configuration file is not common as it resets the Slurm configuration.
slurm-cw.setup reconfigure <nodes>
Where <nodes> is replaced by:
All “up” nodes:
--upA list of nodes:
-i n[0-2]An expression attribute:
-s 'attributes[_boot_config]=="DefaultBoot"'
Manage Slurm#
See Manage Slurm for details about adding and removing nodes, troubleshooting Slurm, and common Slurm commands.
Configure OpenPBS#
OpenPBS is only available for RHEL/CentOS 8 clusters.
See Job Schedulers for general job scheduler information and configuration guidelines. See https://www.openpbs.org for OpenPBS documentation.
First install OpenPBS software on the job scheduler server:
sudo dnf install openpbs-scyld --enablerepo=cw* --enablerepo=scyld*
Use a helper script to complete the initialization and setup the job scheduler and config file in the compute node image(s).
Note
The openpbs-scyld.setup script performs the init,
reconfigure, and update-nodes actions (described below) by default
against all up nodes. Those actions optionally accept a node-specific
argument using the syntax [--ids|-i <NODES>] or a group-specific
argument using [--ids|-i %<GROUP>].
See Attribute Groups for details.
openpbs-scyld.setup init # default to all 'up' nodes
openpbs-scyld.setup update-image openpbsImage # for permanence in the image
Reboot the compute nodes to bring them into active management by OpenPBS. Check the OpenPBS status:
openpbs-scyld.setup status
# If the OpenPBS daemon is not executing, then:
openpbs-scyld.setup cluster-restart
# And check the status again
This cluster-restart is a manual one-time setup that doesn't affect the
openpbsImage.
The update-image is necessary for persistence
across compute node reboots.
Generate new openpbs-specific config files with:
openpbs-scyld.setup reconfigure # default to all 'up' nodes
Add nodes by executing:
openpbs-scyld.setup update-nodes # default to all 'up' nodes
or add or remove nodes by executing qmgr.
Any such changes must be added to openpbsImage by reexecuting:
openpbs-scyld.setup update-image openpbsImage
and then either reboot all the compute nodes with that updated image, or additional execute:
openpbs-scyld.setup cluster-restart
to manually push the changes to the up nodes without requiring a reboot.
Inject users into the compute node image using the sync-uids script.
The administrator can inject all users, or a selected list of users,
or a single user.
For example, inject the single user janedoe:
/opt/scyld/clusterware-tools/bin/sync-uids \
-i openpbsImage --create-homes \
--users janedoe --sync-key janedoe=/home/janedoe/.ssh/id_rsa.pub
See Configure Administrator Authentication and
/opt/scyld/clusterware-tools/bin/sync-uids -h for details.
To view the OpenPBS status on the server and compute nodes:
openpbs-scyld.setup status
The OpenPBS service can also be started and stopped cluster-wide with:
openpbs-scyld.setup cluster-stop
openpbs-scyld.setup cluster-start
OpenPBS executable commands and libraries are installed in
/opt/scyld/openpbs/.
Each OpenPBS user must set up the PATH and LD_LIBRARY_PATH
environment variables to properly access the OpenPBS commands.
This is done automatically for users who login when OpenPBS is running
via the /etc/profile.d/scyld.openpbs.sh script.
Alternatively, each OpenPBS user can manually execute module load openpbs
or can add that command line to (for example) the user's
~/.bash_profile or ~/.bashrc.