Using Ansible#
A compute node can be configured to execute an Ansible playbook at boot time or after the node is up. In the following example, the cluster administrator creates a git repository hosted by the ICE ClusterWare™ head nodes, adds an extremely simple Ansible playbook to that git repository, and assigns a compute node to execute that playbook.
Install the clusterware-ansible package into the image (or images) that you want to support execution of an Ansible playbook:
scyld-modimg -i DefaultImage --install clusterware-ansible --upload --overwrite
The administrator should amend their PATH variable to include the
git binaries that are provided as part of the clusterware package in
/opt/scyld/clusterware/git/
.
This is not strictly necessary, though the git
in that subdirectory
is often significantly more recent than the version normally
provided by a base distribution:
export PATH=/opt/scyld/clusterware/git/bin:${PATH}
The administrator should add their own personal public key to their ClusterWare admin account. This key will be populated into root user's (or _remote_user's) authorized_keys file on newly booted compute nodes. See Compute Node Remote Access for details. In addition, this provides simple SSH access to the git repository:
scyld-adminctl up keys=@/full/path/.ssh/id_rsa.pub
Adding the localhost's host keys to a personal known_hosts file is not strictly necessary, though it will avoid an SSH warning that can interrupt scripting:
ssh-keyscan localhost >> ~/.ssh/known_hosts
Now create a ClusterWare git repository called "ansible". This repository will default to public, meaning it is accessible read-only via unauthenticated HTTP access to the head nodes and therefore should not include unprotected sensitive passwords or keys:
scyld-clusterctl gitrepos create name=ansible
Note that being unauthenticated means the HTTP access mechanism does not
allow for git push
or other write operations.
Alternatively the repository can be marked private (public=False
),
although it then cannot be used for a client's ansible-pull
.
Initially the repository will include a placeholder text file that can be deleted or discarded.
Now clone the git repo over an SSH connection to localhost:
git clone cwgit@localhost:ansible
The administrator could also create that clone on any machine that has the appropriate private key and can reach the SSH port of a head node.
Finally, create a simple Ansible playbook to demonstrate the functionality:
cat >ansible/HelloWorld.yaml <<EOF
---
- name: This is a hello-world example
hosts: n*.cluster.local
tasks:
- name: Create a file called '/tmp/testfile.txt' with the content
copy:
content: hello world
dest: /tmp/testfile.txt
EOF
and add that playbook to the "ansible" git repo:
bash -c "\
cd ansible; \
git add HelloWorld.yaml; \
git -c user.name=Test -c user.email='<test@test.test>' \
commit --message 'Adding a test playbook' HelloWorld.yaml; \
git push; \
"
Multiple playbooks can co-exist in the git repo.
In a multiple-head node cluster an updated git repository will be replicated
to other head nodes in the cluster,
so any client ansible_pull
to any cluster head node will see the same
playbook and the same commit history.
This replication can require several seconds to complete.
With the playbook now available in the git repo,
configure the compute node to execute ansible-pull
to download it
at boot time:
scyld-nodectl -i n1 set _ansible_pull=git:ansible/HelloWorld.yaml
Alternatively, to download the playbook from an external git repository on the server named gitserver:
scyld-nodectl -i n1 set _ansible_pull=http://gitserver/path/to/repo/root:HelloWorld.yaml
Either format can optionally end with "@<gitrev>", where <gitrev> is a specific commit, tag, or branch in the target git repo.
Use the _ansible_pull_args attribute to specify any arguments to the
underlying ansible-pull
command.
You may now reboot the node and wait for it to boot to an up status after the playbook has executed:
scyld-nodectl -i n1 reboot then waitfor up
You can verify that the HelloWorld.yaml playbook executed:
scyld-nodectl -in1 exec cat /tmp/testfile.txt ; echo
Note that during playbook execution the node remains in the booting status,
changing to an up status after the playbook completes,
assuming the playbook is not fatal to the node.
That status may timeout to down (with no ill effect) when executing a
lengthy playbook before switching to up after playbook completion.
Administrators are advised to log the ansible progress to a known location on
the booting node, such as /var/log/ansible.log
.
The clusterware-ansible package supports another attribute,
_ansible_pull_now, which uses the same syntax as _ansible_pull.
Prior to first use, the administrator must enable
the cw-ansible-pull-now
service inside the chroot image:
systemctl enable cw-ansible-pull-now
and then on a running compute node, start
the service:
systemctl start cw-ansible-pull-now
When the attribute is present and the service has been enabled and started, the node will download and execute the playbook during the node's next status update event, which occur every 10 seconds by default. Once the node completes execution of the playbook, it directs the head node to prepend "done" to the _ansible_pull_now attribute to ensure the script does not run again.
Using Node Attributes with Ansible#
Admins can also change how playbooks run by reading ClusterWare node attributes
into Ansible variables.
The clusterware-node
package includes a library of shell functions that
can be used, in particular, attribute_value
reads an attribute out of
nodes configuration.
Inside the playbook, one can register a variable using the output of a
command, and that command can reference the attribute_value
function:
- name: Read the slurm_server attribute
shell:
executable: /bin/bash
cmd: "source /opt/scyld/clusterware-node/functions.sh && attribute_value slurm_server"
register: slurm_server
This snippet would set an Ansible variabled called slurm_server
that
would read the node attribute of the same name. Any ClusterWare
or user-defined attribute can be referenced in this way.
If a default value is needed, it can be given as a second argument:
attribute_value attrname defaultvalue
.
Applying Ansible Playbooks to Images#
Cluster administrators commonly create and deploy a golden image containing all of the necessary libraries, tools, and applications. Given the frequent nature of software updates, the golden image can be out of date soon after it is created. With this in mind, many production clusters collect required changes into an Ansible playbook and then use the _ansible_pull functionality to deploy that playbook to ClusterWare nodes at boot time, or even to booted nodes using the _ansible_pull_now functionality.
Applying changes from an Ansible playbook adds a delay between when the node
begins booting and when the node is ready to accept jobs after fully booting.
Eventually this delay becomes cumbersome and the cluster administrator
will want to flush the changes out of the playbook and into the image.
The scyld-modimg –deploy <PATH>
command supports executing a local playbook
into the chroot
.
Using this functionality requires that the clusterware-ansible
package is
installed on the head node and that the community.general
Ansible Galaxy
collection is installed for the chroot
connection type. The following pair
of commands installs the package on the system and installs the Ansible
collection for the root user:
sudo dnf install --assumeyes --enablerepo=scyld\* clusterware-ansible
sudo -E /opt/scyld/clusterware-ansible/env/bin/ansible-galaxy \
collection install community.general
The collection needs to be available to root because the ansible-playbook
command is executed using sudo to allow full write permissions to all files
within the chroot
.
The scyld-modimg
command assumes that any path that ends with .yaml
is
an Ansible playbook and uses the configured software to execute that playbook
within the chroot
.
scyld-modimg -iDefaultImage --deploy HelloWorld.yaml \
--progress none --upload --overwrite --discard-on-error
The new --discard-on-error
argument prevents the tool from asking for user
confirmation before uploading. It assumes that the user wants to keep the result
of a successful run but stop if an error was encountered. The following is an
example of the expected output from the previous command:
[admin@cwhead ~]$ scyld-modimg -iDefaultImage --deploy HelloWorld.yaml \
--progress none --upload --overwrite --discard-on-error
Treating HelloWorld.yaml as an ansible playbook
Downloading and unpacking image DefaultImage
Executing step: Ansible ['/opt/scyld/clusterware-ansible/bin/ansible-playbook', 'HelloWorld.yaml']
DefaultImage : ok=2 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
step completed in 0:00:06.2
Executing step: Upload
Repacking DefaultImage
fixing SELinux file labels...
done.
Checksumming...
Cleaning up.
Checksumming image DefaultImage
Replacing remote image.
step completed in 0:09:33.7