Applications Report Excessive Interruptions and Jitter#

In certain circumstances that are more common with real-time, performance sensitive multi-node applications, applications may occasionally suffer noticeable unwanted interruptions or "jitter" that affects the application's stability and predictability.

Some issues may be remedied by having the affected compute nodes execute in "busy mode", during which the node's cw-status-updater service severely reduces the scope of what information it periodically gathers and reports to the node's parent. That service's normal operation may exhibit an infrequent 1-2 second computation stall, which in a cluster with hundreds or thousands of nodes may affect a multi-node real-time application's otherwise rapid periodic sync'ing.

"Busy mode" can be enabled in one of three ways:

  • Set the node's boolean _busy reserved attribute to True with a case-insensitive 1, "on", "y", "yes", "t", or "true". See the _busy attribute in Reserved Attributes for details. Turn off "busy mode" by setting _busy to False with 1, "off", "n", "no", "f", or "false", or by clearing the _busy attribute completely.

  • Execute touch /opt/scyld/clusterware-node/etc/busy.flag on the node in a job scheduler prologue to enable and rm that file in an epilogue to disable. This busy.flag method is ignored if the node's _busy attribute is explicitly set to True or False.

  • The node's cw-status-updater service may heuristically decide on its own to execute in "busy mode". This method is overridden by the presence of busy.flag or by an explicit _busy attribute setting.

An additional approach is to employ cpusets to execute specific applications on specific node cores in order to minimize contention. See the _status_cpuset attribute in Reserved Attributes for details about how to do this for the cw-status-updater service, and consult your Linux distribution or job scheduler documentation for how to do this for your applications.