Using Node Selectors#

The ICE ClusterWare ™ node selector language can provide a quick, powerful way to target actions at ad hoc groups of compute nodes. Since the language can operate on administrator-defined attributes, they can be highly customized to the administrator’s own cluster and needs.

This section gives additional detail and examples using complex node selectors. For a basic explanation, see Node Selectors.

Example Cluster#

For the examples shown here, assume a cluster set up as follows:

  • Rack 0

    • Head node - head1

    • Storage

    • Cluster core network

  • Rack 1: shared by all users

    • Compute nodes n0 - n39

    • 32 CPUs (11th Gen)

    • 64 GB RAM

  • Rack 2: owned by Engineering Department

    • Compute nodes n40 - n79

    • 32 CPUs (11th Gen)

    • 256 GB RAM

  • Rack 3: owned by Data Science Department

    • Compute nodes n80 - n119

    • 64 CPUs (11th Gen)

    • 64 GB RAM

  • Rack 4: owned by Data Science Department

    • Compute nodes n120 - n159

    • 64 CPUs (11th Gen)

    • 64 GB RAM

  • Rack 5

    • Owned by Engineering Department

      • Compute nodes n160 - n179

      • 32 CPUs (12th Gen)

      • 256 GB RAM

    • Owned by Data Science Department

      • GPU compute nodes gpu0 - gpu19

      • 16 CPUs

      • 64 GB RAM

      • GPU accelerator

Configure Example Cluster#

The following steps complete the example cluster configuration, including adding custom attributes:

  1. Set rack locations:

    cw-nodectl -in[0-39] set rack_location=1
    cw-nodectl -in[40-79] set rack_location=2
    ...
    cw-nodectl -igpu[0-19] set rack_location=5
    

    Since the rack_location attribute does not exist on a default cluster (it's not one of the Reserved Attributes), the first set command creates the new attribute.

  2. Create attribute groups:

    cw-attribctl create name=AllDepts
    cw-attribctl create name=EngDept
    cw-attribctl create name=DataSciDept
    cw-attribctl create name=GpuGroup
    
  3. Join nodes to the attribute groups:

    cw-nodectl -in[0-179],gpu[0-19] join AllDepts
    cw-nodectl -in[40-79,160-179] join EngDept
    cw-nodectl -in[80-159],gpu[0-19] join DataSciDept
    cw-nodectl -igpu[0-19] join GpuGroup
    
  4. Customize the boot configurations if needed. Most compute nodes have _boot_config=DefaultBoot.

    cw-imgctl -iDefaultImage clone name=GpuImage
    cw-bootctl -iDefaultBoot clone name=GpuBoot image=GpuImage
    
  5. Customize the GPU environment:

    cw-modimg -iGpuBoot –chroot
    

Use Node Selectors to Take Actions#

After configuring the example cluster, the following commands take action on sets of nodes using the node selector.

Power off all nodes with the “rack_location” attribute set to 5 (that is, power off everything in rack 5):

cw-nodectl –selector “attributes[rack_location]==5” power off

Tip

While we might be implicitly thinking about compute nodes in this example, you can also tag top-of-rack switches, storage devices, and so on with “rack_location” attributes and then you can truly power off the entire rack with a single command.

Reboot all “11th Gen CPU” servers (perhaps after applying a patch):

cw-nodectl –selector “ ‘11th Gen CPU’ in hw[cpu_model]” reboot

Note

In this case, we surround the string with single quotes to account for the spaces inside the string (backslash-escaped double quotes would also work). Also, note the use of the ‘in’ operator, which tests for a string within a string.