Grafana Cluster Monitoring#

The ClusterWare - Cluster Monitoring dashboard displays a summary of current activity on the head node and all compute nodes and is shown upon intial login.

The following example shows the head node and first several nodes of a 49-node cluster.

Grafana Cluster

Grafana General Page#

Click General / ClusterWare - Cluster Monitoring at the top of the page to display a list of the available dashboards.

Grafana General

The menu lists the Recent dashboards as well as the full General list. Click ClusterWare - Node Monitoring to display detailed state and activity data for individual nodes.

Grafana Node Monitoring#

The default Node Monitoring display shows details for individual nodes, beginning with the head node.

Grafana Node Head

Click the drop-down list with the current node name at the top left of the dashboard to select a different node in the cluster.

For example, select "n02.cluster.local":

Grafana Node n02

Grafana Alerts#

You can define an Alerts dashboard with configurable panels and alert notifications.

  1. Click the Alerting menu item (bell icon) in the left navigation panel.

  2. On the Alert Rules tab, click New alert rule.

  3. Define the conditions or events about which you want to receive alerts as well as how those alerts should be delivered to you.

Consult the GrafanaLabs documentation for additional details.

An example Alerts dashboard is:

Grafana Alerts

The first panel displays the CPU load levels for the first 10 compute nodes. The second panel displays the disk usage for one head node.

Alerts can be edited by clicking the title bar and selecting Edit from the drop-down menu. In the example below, the Query tab defines what gets shown in the panel. The Alert tab defines what values trigger an alert, what to send in an alert message, and where to send the message.

Grafana Alerts Edit

See the GrafanaLabs documentation for details about setting up alerts.