State Maps#

A common task for a cluster administrator is identifying specific nodes that are out of compliance in some way and executing actions to solve such issues. These actions often involve temporarily removing the node(s) from production while performing testing, reprovisioning, and requalification. As problem nodes are identified, new nodes are added, or nodes are transferred from one configuration to another, the cluster administrator must have some means to keep track of the progress of each node through these processes. After all, these processes involve multiple stages, likely spanning multiple reboots or even reimaging. ICE ClusterWare™ node attributes can be leveraged for persistently storing this progress information.

For example, when a node health check detects a memory issue on a GPU, other tasks may dictate that power cycling the node for row remapping cannot occur immediately. Instead, the health checking code could set an attribute noting what was detected. Then, a separate process could see that attribute and initiate the steps of removing the node from production, rebooting it, triggering requalification tests, and moving it back into production if all goes well.

Of course, this simple detection and mitigation process only covers one type of failure and one possible resolution. The GPU or other hardware could fail in a myriad of ways, each requiring different mitigation strategies. This means that a node’s health check results or progress through requalification may be stored across multiple node attributes.

The ClusterWare node selection language allows a cluster administrator to identify nodes that match possibly complex criteria by matching attribute values, detected status, or hardware details using basic comparators and logical operators. See the above section Attribute Groups and Dynamic Groups for dynamic groups for examples of node selectors.

Polling the ClusterWare service for attribute status frequently or across many nodes is inefficient and in extreme cases can impact the head node performance. To alleviate this, the ClusterWare platform provides a scyld-nodectl waitfor mechanism. One common use is to wait for a node to boot before proceeding with additional steps in an overall command, for example:

scyld-nodectl -in10 reboot then waitfor up then exec uptime

The up is shorthand for a longer selector, specifically status[state] == “up” and can be replaced with more complicated selectors if, for example, the administrator is not rebooting the node but executing a command that will modify a node attribute when it completes. This sort of command chaining with “then” allows for simple automation, but more complex automation will deal with multiple nodes at different stages. For that case, the ClusterWare platform allows administrators to provide a set of selectors referred to as a state map.

Using a state map, a cluster administrator can track nodes through scenarios including: * Error detection, handling, and requalification * A rolling firmware update process * Idle-time performance testing

State maps provide a general purpose mechanism to select groups of nodes based on their status and configuration, trigger actions, and observe the resulting changes. An example state map is provided as part of the clusterware-tools package:

$ cat /opt/scyld/clusterware-tools/examples/node-states.ini
[status]
up = status[state] == "up"
down = status[state] == "down"
booting = status[state] == "booting"

This INI format defines a state map named “status” containing 3 states named “up”, “down”, and “booting”. The selector that defines each state is provided to the right of the name, after the equal sign. This file can also be written in JSON format as:

{
  "name": "status",
  "states": {
      "up": "status[state] == \"up\"",
      "down": "status[state] == \"down\"",
      "booting": "status[state] == \"booting\""
  }
}

The cluster administrator can load the state map through the scyld-nodectl command:

scyld-nodectl waitfor --load-only @node-status.json

Once the state map is loaded, the waitfor command can also be used to see what nodes match what selectors by referencing the loaded map name, i.e. “status” in this example:

$ scyld-nodectl waitfor --name status
Nodes
  n[5-8,10]: up

Additional arguments are available to allow for streaming state transitions or simplifying the output for easier parsing:

$ scyld-nodectl waitfor --stream --name status
n[1-4] in down
n[5-10] in up

n[5] left up
n[5] entered down
n[5] left down
n[5] entered booting
n[5] left booting
n[5] entered up

In the above example, a single node in a 10 node cluster was rebooted and state transitions were emitted as the node progressed from “up” to “down” to “booting” and back to “up”.