Monitor the Quarantine Tenancy#

Note

Tenants, tenancies, and other features are available if you have paid for multi-tenant support. Contact Penguin Computing to learn more.

All multi-tenant clusters have a Quarantine tenancy that is created during the multi-tenancy setup process. After initial creation, the Quarantine tenancy only contains a head node, health check node, and gateway node. All bare metal compute nodes that will be added to tenancies should be added to the Quarantine tenancy. See Add Compute Nodes to Quarantine and Available Tenancies for details.

The Quarantine tenancy is used to run health and performance checks on bare metal compute nodes, wipe any customer data, and verify the node's firmware. See Available and Quarantine Tenancies for additional information about the Quarantine tenancy.

Add Nodes to Quarantine Tenancy#

When a tenancy is decommissioned, the compute nodes associated with that tenancy are automatically moved to the Quarantine tenancy. You may also need to move nodes to the Quarantine tenancy manually, such as when adding new bare metal nodes to the super-cluster to be used in tenancies.

To add nodes to the Quarantine tenancy:

cw-clusterctl tenancies -i Quarantine assign <node list>

After the compute nodes are added to the Quarantine tenancy, the nodes are:

  • Powered on and booted to a known good image.

  • Wiped completely to ensure no customer data remains, including programs, user and temporary files, customer application data, and so on.

  • Reverted to a known good firmware state.

  • Run through a series of health and performance checks.

Monitor Node Status in Quarantine Tenancy#

The superadministrator can monitor node health check status within the Quarantine tenancy. Node status values are updated automatically as health checks run. The status value is stored in the _aim_current_state reserved attribute and can include:

  • New Node: Node is added to the tenancy, but is not currently powered on.

  • Provisioning: Node is powered on and being provisioned by the Quarantine tenancy head node.

  • Available: Node is healthy and ready to be moved to the Available tenancy. All health and performance checks passed.

  • Draining: An error was detected with node health or performance. Any work assigned to the node will complete and the node will not be assigned new work.

  • Drained: The node does not have any work assigned and health checks are running.

  • Auto Remediation: The errors detected by the health and performance checks are being fixed automatically.

  • Work Queue: The errors detected by the health and performance checks cannot be fixed automatically. Manual testing by the superadministrator is required.

Move Healthy Nodes to Available Tenancy#

A node is healthy when it has a status of Available (found in the _aim_current_state reserved attribute). The time it takes from when the node is added to the Quarantine tenancy to when it is ready to move to the Available tenancy can vary, but typically should not take more than a few minutes.

  1. Remove the node from the Quarantine tenancy:

    cw-clusterctl tenancies -i Quarantine unassign <healthy node>
    
  2. Add the node to the Available tenancy:

    cw-clusterctl tenancies -i Available assign <healthy node>