Using Node Selectors#
The ICE ClusterWare ™ node selector language can provide a quick, powerful way to target actions at ad hoc groups of compute nodes. Since the language can operate on administrator-defined attributes, they can be highly customized to the administrator’s own cluster and needs.
This section gives additional detail and examples using complex node selectors. For a basic explanation, see Node Selectors.
Example Cluster#
For the examples shown here, assume a cluster set up as follows:
Rack 0
Head node - head1
Storage
Cluster core network
Rack 1: shared by all users
Compute nodes n0 - n39
32 CPUs (11th Gen)
64 GB RAM
Rack 2: owned by Engineering Department
Compute nodes n40 - n79
32 CPUs (11th Gen)
256 GB RAM
Rack 3: owned by Data Science Department
Compute nodes n80 - n119
64 CPUs (11th Gen)
64 GB RAM
Rack 4: owned by Data Science Department
Compute nodes n120 - n159
64 CPUs (11th Gen)
64 GB RAM
Rack 5
Owned by Engineering Department
Compute nodes n160 - n179
32 CPUs (12th Gen)
256 GB RAM
Owned by Data Science Department
GPU compute nodes gpu0 - gpu19
16 CPUs
64 GB RAM
GPU accelerator
Configure Example Cluster#
The following steps complete the example cluster configuration, including adding custom attributes:
Set rack locations:
cw-nodectl -in[0-39] set rack_location=1 cw-nodectl -in[40-79] set rack_location=2 ... cw-nodectl -igpu[0-19] set rack_location=5
Since the
rack_locationattribute does not exist on a default cluster (it's not one of the Reserved Attributes), the firstsetcommand creates the new attribute.Create attribute groups:
cw-attribctl create name=AllDepts cw-attribctl create name=EngDept cw-attribctl create name=DataSciDept cw-attribctl create name=GpuGroup
Join nodes to the attribute groups:
cw-nodectl -in[0-179],gpu[0-19] join AllDepts cw-nodectl -in[40-79,160-179] join EngDept cw-nodectl -in[80-159],gpu[0-19] join DataSciDept cw-nodectl -igpu[0-19] join GpuGroup
Customize the boot configurations if needed. Most compute nodes have
_boot_config=DefaultBoot.cw-imgctl -iDefaultImage clone name=GpuImage cw-bootctl -iDefaultBoot clone name=GpuBoot image=GpuImage
Customize the GPU environment:
cw-modimg -iGpuBoot –chroot
Use Node Selectors to Take Actions#
After configuring the example cluster, the following commands take action on sets of nodes using the node selector.
Power off all nodes with the “rack_location” attribute set to 5 (that is, power off everything in rack 5):
cw-nodectl –selector “attributes[rack_location]==5” power off
Tip
While we might be implicitly thinking about compute nodes in this example, you can also tag top-of-rack switches, storage devices, and so on with “rack_location” attributes and then you can truly power off the entire rack with a single command.
Reboot all “11th Gen CPU” servers (perhaps after applying a patch):
cw-nodectl –selector “ ‘11th Gen CPU’ in hw[cpu_model]” reboot
Note
In this case, we surround the string with single quotes to account for the spaces inside the string (backslash-escaped double quotes would also work). Also, note the use of the ‘in’ operator, which tests for a string within a string.