Have a task in your Ansible playbook that takes a long time to run, say a very large package installation or download across a slow network link? Depending on how long it takes, Ansible may think the command has failed and fail at that point in the playbook.
Async and Poll in a nutshell
The standard way to do this is to use the Ansible async:
and poll:
flags. The documentation isn’t really clear on this, so here’s how I think of their actions:
- The
async: B
flag says “Run this command in the background for B seconds….” - The
poll: P
flag means “…and check the status every P seconds.”
Thus, a command like this:
- name: Download a big file
shell:
"wget -O /tmp/my_big_file.iso https://example.com/downloads/a_really_big_file.iso"
async: 120
poll: 5
(Yes, I know there are more Ansible-friendly ways to download a file from a remote URL, but go along with the example…)
So on a good day when the download speeds are high, it might download and Ansible will continue on. On days when the Internet connection is slow, this tool will kick off the wget
command, and every 5 seconds it will check if the command is done. When it completes, the playbook goes on. If the wget
fails (network error, disk write, etc), or the command takes longer than 120 seconds, Ansible will fail this step as expected.
That’s all well and good. What’s the catch?
Check mode
One feature I love about Ansible is the --check
mode option. A well written Ansible module will run in --check
mode and do everything it can to validate that it will execute on the managed systems without making any changes to the remote system. This is key when you’re working on a playbook to maintain production systems.
Say you know that a configuration file needs a correction applied. You take the playbook you used to build the system originally, check it out of your source control to a new branch and modify the playbook.
But a cautious developer will check that the playbook runs as expected and doesn’t do anything else unexpected (reboot the server, stop services, fail mid-way through, etc.). To do this, run your playbook with the --check
flag. The output looks identical to when it is run normally, but this time the lines that are changed:
are actually not changes, rather telling you that this play would make a change.
Some commands are inherently un-safe for Ansible to generically run them, tasks such as shell:, command:, and others more “raw” command have this limitation. Ansible tries to make sure that a command run in check mode will make no changes whatsoever.
The check mode execution is handy when combined with the --diff
command line flag, but that’s a story for another day.
Async and Check mode
So, using these together makes sense. I want to download a large file over an occasionally slow link but I do not want the download to run when I’m in check mode. You’d think something like the example code from above would be the correct combination:
- name: Download a big file
shell:
"wget -O /tmp/my_big_file.iso https://example.com/downloads/a_really_big_file.iso"
async: 120
poll: 5
But when you run it with the --check
flag, you get this error:
TASK [Download a big file] ***************************
task path: ./playbook.yml:71
fatal: [localhost]: FAILED! => {
"changed": false,
"msg": "check mode and async cannot be used on same task."
}
What to do?
I have to admit, I didn’t think up this workaround – a Mr. Alex Dzyoba documented it on his blog and I came across it here:
https://alex.dzyoba.com/blog/ansible-check-async/
What he documents is using the ansible_check_mode
variable, then set the async:
value to 0
if we’re in check mode, or 120
if we are not. Using our play above we would do this:
- name: Download a big file
shell:
"wget -O /tmp/my_big_file.iso https://example.com/downloads/a_really_big_file.iso"
async: "{{ ansible_check_mode | ternary(0,120) }}"
poll: 5
What ends up happening is based on the ansible_check_mode
variable:
- If we are running in check mode (i.e.
ansible_check_mode
is true), then the value passed toasync:
is zero (the first value in theternary()
call, and Ansible doesn’t complain about the conflict. - When we are running in normal mode (i.e.
ansible_check_mode
is false), then the value passed toasync:
is the second value in theternary()
call, and the play will run for 120 seconds.
Why Ansible doesn’t automatically handle this is beyond me, but I’m glad to have come across Mr. Alex Dzyoba website and this method.