To develop and to keep my Ansible playbooks and roles in good shape, I first relied on virtual machines (VMs) running on VirtualBox and managed by Vagrant files. Before applying changes on production servers, I ran vagrant up
to launch a virtual machine and run the playbooks against that virtual machine.
Later on, I wanted to automate the testing of my playbooks and roles, and be able to run tests more regularly using Gitlab CI/CD. Therefore I switched from Virtualbox virtual machines to light-weight Docker containers. This would allow me to always start from a clean slate and to spin up more tests in parallel without much overhead.
To test roles using Gitlab CI/CD, I used the following .gitlab-ci.yml
.
.test_role_template: &test_role
stage: test
tags:
- ansible
- docker
script:
- pushd roles/${CI_JOB_NAME:5}/tests; ansible-playbook -i inventory test.yml; popd;
# ...
test-auditbeat:
<<: *test_role
rules:
- changes:
- roles/auditbeat/**/*
# ...
test-vault-server:
<<: *test_role
rules:
- changes:
- roles/consul/**/*
- roles/consul-agent/**/*
- roles/vault/**/*
- roles/vault-server/**/*
The first block defines an yaml anchor test_role
, which is used in all following blocks to test roles like auditbeat
or vault-server
. The rules dictate to run the test.yml
playbook inside the tests
folder of the role whenever a file in the role changes e.g. after updating the auditbeat_version
in the defaults/main.yml
or modifying to task or template files. In the case of the vault-server
role, the tests also run whenever a dependent role would change. Whenever either the consul(-agent)
roles or the vault(-server)
roles change, the test.yml
playbook of the vault-server
role is executed.
In the tests
folder, there is an inventory
file present. For simple roles that only need one instance, the following template is used to source the file.
localhost ansible_connection=local
ansible-{{ role_name }} ansible_connection=docker
[all:vars]
ansible_python_interpreter='/usr/bin/env python3'
If more instances are required to reliably test roles, a more elaborated inventory
file is used. For example, the inventory file in the vault-server
role looks like this.
localhost ansible_connection=local
[ansible_vault_servers]
ansible-vault-server-1 ansible_connection=docker
ansible-vault-server-2 ansible_connection=docker
ansible-vault-server-3 ansible_connection=docker
[all:vars]
ansible_python_interpreter='/usr/bin/env python3
For executing tasks on the localhost
, i.e. to start and stop the docker instance(s), we specify ansible_connection=local
which executes the tasks directly as shell commands and not via a default SSH connection. For the same reason, we use ansible_connection=docker
to execute tasks directly on the containers under test.
The test.yml
playbook template reads as follows:
---
- hosts: localhost
tasks:
- name: start container
docker_container:
name: ansible-{{ role_name }}
image: ubuntu:xenial
command: /sbin/init
state: started
- hosts: ansible-{{ role_name }}
pre_tasks:
- name: update all packages to the latest version
apt:
update_cache: yes
upgrade: dist
force_apt_get: yes
roles:
- role: {{ role_name }}
post_tasks: []
- hosts: localhost
tasks:
- name: remove container
docker_container:
name: ansible-{{ role_name }}
state: absent
The first and last block are executed on the localhost to handle the start and removal of the docker container. The middle part, which is run on the docker instance(s), first updates all packages to the latest version and then executes the role.
The post_tasks
is where the actual testing happens.
For the auditbeat
role, the test succeeds when the configuration is valid.
post_tasks:
- name: check auditbeat installation
command: auditbeat test config
register: result
changed_when: False
- name: check auditbeat installation
assert:
that:
- "'Config OK' in result.stdout"
For the vault-server
role, the test should succeed if all Vault servers are reachable.
post_tasks:
- name: check all vault servers are reachable
wait_for:
port: '{{ item }}'
loop:
- 8200
Having this test infrastructure in place, allowed me to easily test roles whenever a new version was released and proved very helpful whenever a new Ubuntu LTS was released. For the former, a small modification like bumping the auditbeat_version
is necessary to trigger the test of a role. For the latter, changing the image: ubuntu:xenial
to image: ubuntu:bionic
in bulk, would trigger testing of various roles.
However, the story isn’t actually that nice. In reality the first test.yml
file looked like this:
---
- hosts: localhost
tasks:
- name: start container
docker_container:
name: ansible_{{ role name }}
image: ubuntu:xenial
command: /sbin/init
state: started
+ capabilities:
+ - SYS_ADMIN
+ volumes:
+ - /sys/fs/cgroup:/sys/fs/cgroup:ro
+ tmpfs:
+ - /run
+ - /run/lock
+ - /tmp
The capabilities
, volumes
and tmpfs
blocks were necessary to start systemd on Ubuntu 16.04 LTS (Xenial Xerus). However as of Ubuntu 18.04 LTS (Bionic Beaver) systemd was no longer present in the base image. The workaround was to start the container with interactive: yes
to prevent the shell process from exiting.
- name: start container
docker_container:
name: ansible_{{ role _name }}
- image: ubuntu:xenial
- command: /sbin/init
+ image: ubuntu:bionic
state: started
- capabilities:
- - SYS_ADMIN
- volumes:
- - /sys/fs/cgroup:/sys/fs/cgroup:ro
- tmpfs: # necessary on Ubuntu 16.04 LTS host to start systemd
- - /run
- - /run/lock
- - /tmp
+ interactive: yes
Moreover, I stumbled upon a bug (or a feature) that the service module doesn’t take into account use option nor ansible_service_mgr override. So I needed to clutter my task files with blocks like this:
- name: start vault server
service: name=vault state=started
when: ansible_connection != 'docker'
- name: start vault server
command: service vault start
when: ansible_connection == 'docker'
args:
warn: no
I realized that Docker containers were clearly the wrong tool for the job.
Later on I discovered Ansible Molecule, which I gave a try but I found it too bloated and kept my original test setup. However, from this experiment, I learnt that linux system containers (LXC) managed by LXD could be used as an alternative driver to Docker without the need to rely on more heavy virtual machines. Eureka!
These linux system containers have the benefit to be as light-weight as docker containers, but do provide full OS experience of virtual machines. This container abstraction more closely resembles the production environment. The same cloud images are available on the major cloud providers and an init system like systemd is launched whenever the container is started. Another benefit is that base image files can be configured to automatically refresh such that most packages are up to date.
Transition to LXD was super easy.
After removing all tasks annotated with ansible_connection == docker
clauses, there was again a single execution path. Running multiple instances is possible with minimal overhead. Test setup almost remained the same. Instead of starting docker containers, the test launched linux containers and the tear down stopped and deleted them. All the tests remained unaltered.
The .gitlab-ci.yaml
file didn’t change except for a tag to demand for lxd
instead of docker
to be present.
.test_role_template: &test_role
stage: test
tags:
- ansible
- lxd
script:
- pushd roles/${CI_JOB_NAME:5}/tests; ansible-playbook -i inventory test.yml; popd;
In the inventory skeleton the ansible_connection=docker
line was replaced by ansible_connection=lxd
.
localhost ansible_connection=local
ansible-{{ role_name }} ansible_connection=lxd
[all:vars]
ansible_python_interpreter='/usr/bin/env python3'
The test.yml
playbook now reads like
---
- hosts: localhost
tasks:
- name: start container
lxd_container:
name: ansible-{{ role_name }}
source:
type: image
mode: pull
server: https://cloud-images.ubuntu.com/releases
protocol: simplestreams
alias: focal/amd64
state: started
wait_for_ipv4_addresses: true
timeout: 600
url: "{% raw %}{{ lxd_container_url | default('unix:/var/lib/lxd/unix.socket') }}{% endraw %}"
- hosts: ansible-{{ role_name }}
pre_tasks:
- name: update all packages to the latest version
apt:
update_cache: yes
upgrade: dist
force_apt_get: yes
roles:
- role: {{ role_name }}
post_tasks: []
- hosts: localhost
tasks:
- name: remove container
lxd_container:
name: ansible-{{ role_name }}
state: absent
url: "{% raw %}{{ lxd_container_url | default('unix:/var/lib/lxd/unix.socket') }}{% endraw %}"
So the starting and removing of container tasks changed to use the lxd_container
module and a somewhat more elaborated form for specifying which image to use. The wait_for_ipv4_addresses
was necessary for roles that rely on an ipv4 stack to be present and ready for action. The url
line is added to run the tests manually on a macOS machine running an Ubuntu virtual machine controlled with Canonical Multipass.
All my roles, except for the auditbeat role, are properly tested with minimal overhead. Auditbeat however needs special kernel capabilities to run. So far I’ve not found an alternative to Run Auditbeat on Docker. Adding additional privileges as suggested in he Unable to start Auditbeat on LXC container thread did not do the trick. I still got the following error
Exiting: 1 error: failed to create audit client: failed to get audit status: operation not permitted
Ideas to workaround this single remaining issue are welcome! For now I settle with the fact that LXD can also manage virtual machines. In the tasks to start and stop the lxd_container
a single line to specify type: virtual-machine
is required.
diff --git a/roles/auditbeat/tests/test.yml b/roles/auditbeat/tests/test.yml
index f3db10b5..e403213f 100644
--- a/roles/auditbeat/tests/test.yml
+++ b/roles/auditbeat/tests/test.yml
@@ -10,8 +10,7 @@
server: https://cloud-images.ubuntu.com/releases
protocol: simplestreams
alias: focal/amd64
- config:
- security.privileged: 'true'
+ type: virtual-machine
state: started
wait_for_ipv4_addresses: true
timeout: 600
@@ -47,5 +46,6 @@
- name: remove container
lxd_container:
name: ansible-auditbeat
+ type: virtual-machine
state: absent
url: "{{ lxd_container_url | default('unix:/var/lib/lxd/unix.socket') }}
Keep on developing and (start) testing your Ansible playbook and roles!