cw-healthctl#

NAME

cw-healthctl -- Query and modify health checks for the cluster.

USAGE

cw-healthctl

[-h] [-v] [-q] [[-c | --config] CONFIG] [--base-url URL] [[-u | --user] USER[:PASSWD]] [--human | --json | --csv | --table] [--pretty | --no-pretty] [--show-uids] [[-i | --ids] CHECKS | -a | --all] {list,ls, create,mk, clone,cp, update,up, delete,rm}

OPTIONAL ARGUMENTS

-h, --help

Print usage message and exit. Ignore trailing args, parse and ignore preceding args.

-v, --verbose

Increase verbosity.

-q, --quiet

Decrease verbosity.

-c, --config CONFIG

Specify a client configuration file CONFIG.

--show-uids

Do not try to make the output more human readable.

-a, --all

Interact with all health checks (default for list).

-i, --ids CHECKS

A comma-separated list of health checks to query or modify. Values can include name, UID, or truncated UID.

--reset-all

Revert health checks to the default definitions in the health repo.

ARGUMENTS TO OVERRIDE BASIC CONFIGURATION DETAILS

--base-url URL

Specify the base URL of the ClusterWareAI REST API.

-u, --user USER[:PASSWD]

Masquerade as user USER with optional colon-separated password PASSWD.

FORMATTING ARGUMENTS

--human

Format the output for readability (default).

--json

Format the output as JSON.

--csv

Format the output as CSV.

--table

Format the output as a table.

--pretty

Indent JSON or XML output, and substitute human readable output for other formats.

--no-pretty

Opposite of --pretty.

CHECK FIELDS

Each health check is described by the following fields:

name

Required. Unique identifier for the check.

command

Required. Shell command executed on each target node. The exit code is interpreted Nagios-style: 0 = OK, 1 = WARNING, 2 = CRITICAL, 3 = UNKNOWN.

interval

Seconds between successive runs of the check. Must be a positive integer. If omitted on create, the cluster field default_health_interval is used.

timeout

Seconds after which a running check is killed. Must be a positive integer. If omitted on create, the cluster field default_health_timeout is used.

labels

List of label strings used to group checks and to target them via the %label selector syntax (see SELECTING CHECKS BY LABEL below).

description

Free-form text describing the check.

flap_thresholds

Dictionary controlling when a flapping check signals a failure. Supported keys are fail_streak (number of consecutive failures) and fail_percentage (percentage of failures in a rolling window).

ACTIONS

clone (cp) [--content JSON | INI_FILE] [NAME=VALUE ...]

Copy health check(s) to new identifiers.

--content JSON | INI_FILE

Overwrite fields in the cloned check.

create (mk) [--content JSON | INI_FILE] [NAME=VALUE ...]

Add a health check.

--content JSON | INI_FILE

Load this content into the database as a health check. The content may be JSON, an INI file, a YAML document, or a YAML stream containing multiple checks.

delete (rm)

Delete health check(s).

list (ls) [--long | --long-long] [--nodes [NODE ...]] [--show-labels]

Show information about health check(s).

-l, --long

Show a subset of all optional information for each check.

-L, --long-long

Show all optional information for each check.

--nodes [NODE ...]

Instead of listing checks, list the checks assigned to the given node(s). Node specs may include ranges such as n[0-1]. An empty value means all nodes.

--show-labels

When used with --nodes, annotate each check with its labels.

update (up) [--content JSON | INI_FILE] [NAME=VALUE ...]

Modify health check fields.

--content JSON | INI_FILE

Overwrite fields in the specified check(s).

SELECTING CHECKS BY LABEL

Positional target values that begin with % are treated as label selectors rather than check names. A bare %label matches any check carrying that label; multiple %label tokens (or a quoted selector expression) are flattened into an OR-set: a check matches if any of its labels appears in the set. Boolean connectives (and, or, parentheses) are accepted for syntactic compatibility with cw-nodectl --selector but do not restrict the match.

For example, given checks check1 (labels: cpu) and check2 (labels: gpu):

cw-healthctl -i %cpu ls           # matches check1
cw-healthctl -i %gpu ls           # matches check2
cw-healthctl -i '%cpu,%gpu' ls    # matches check1 and check2

EXAMPLES

cw-healthctl list

List all health checks.

cw-healthctl create name=check_zombies command='check_zombie.py -w 10 -c 20' interval=30 timeout=10 labels=cpu

Add a new check that runs every 30 seconds with a 10-second timeout and has the cpu label.

cw-healthctl --content @checks.yaml create

Create one or more health checks from a YAML file. The file may contain a single document, a list of checks, or a multi-document YAML stream.

cw-healthctl -i check_zombies update interval=60

Change the interval of check_zombies to 60 seconds.

cw-healthctl -i %cpu ls -L

Show full details for every check with the cpu label.

cw-healthctl ls --nodes n[0-3] --show-labels

List the health checks assigned to nodes n0 through n3, annotated with their labels.

RETURN VALUES

Upon successful completion, cw-healthctl returns 0. On failure, an error message is printed to stderr and cw-healthctl returns 1.