You have just run lxc launch ubuntu:18.04 mycontainer
and a new container is being created. The command returns very quickly (around 1-2s) and the container image starts running. The container image may take a few more seconds to complete, so that the init
performs all the required tasks.
The problem
The question is, how do you know programmatically when a container’s init
has really finished and the startup has been completed?
We will answer first this question, why do we need to know when a running container’s startup has really been completed? We need to know when we write automation scripts. Some commands of the automation script will fail to work if the container has not completed fully the start up. For example, the ubuntu:18.04
container images create a non-root account (username: ubuntu
). This account is created near the end of the startup process, therefore if we try to execute commands relating to ubuntu
on a container that has not completed the startup, those commands will fail.
Towards a solution
The proper way to solve this issue is to use a feature of the init
subsystem of the container image that can tell us when it has completed the startup.
In the case of the Ubuntu 16.04 and newer container images, they use
And there is functionality to report back if a system has completed the startup, by running systemd
.systemctl is-system-running
. When a container is starting up, the state is initializing
. As soon as it completed the startup, the state switches into running
.
$ systemctl is-system-running running
Five issues
The first issue is that systemd
should have a feature to wait for us instead of we, having to check in a loop (polling) when the state changes. It actually does, and has been added to systemd
on August 2019 as systemctl: add support for –wait to is-system-running #9796. Translating this into Ubuntu versions, it means that in Ubuntu 19.04 or newer, we can use --wait
as in systemctl is-system-running --wait
. Very easy.
The second issue is with Ubuntu versions prior to Ubuntu 19.04, where we need to perform polling. Polling has some complications. Some systemd
targets will fail if they are set to run but are not able to complete in a container. Therefore, the end state per systemd will be degraded
instead of running
. Therefore, when polling, we need to check for either of these two states.
The third issue is that as soon as LXD launches a container, it takes a little bit for systemd to start up and be able to respond to requests for its state. You get the error Failed to connect to bus: No such file or directory
if you ask too fast.
The fourth issue is that in newer versions systemd that has the –wait parameter, the command will fail with
if we run it too soon. Which means that a simple Failed to connect to bus: No such file or directory
is not sufficient. We need a bit of polling here until systemd is ready to report the state.systemctl is-system-running --wait
The fifth issue is that both the following commands return the same error code 1. systemctl is-system-running
when it gives the error Failed to connect to bus: No such file or directory
. And systemctl is-system-running
when the result is
. That means that we need to be careful when we consume the error message through the return value, because the return value is not unique for the error message. degraded
Here is the sequence of states for systemd when it starts in an Ubuntu LXD container. In parenthesis is the number of multiple times I got this message on my test system, until systemd completed the startup (reaching state: degraded).
Failed to connect to bus: No such file or directory (185 times) initializing (189 times) starting (168 times) degraded
Now we are ready to put all these together and have a solution for Ubuntu 19.04 (or newer), and a solution for Ubuntu 16.04/18.04.
Solution for Ubuntu 19.04 or newer
The following example script installs a snap package as soon as the container has fully started. The first lxc exec
waits until after the systemctl is-system-running
command does not return an error. The second lxc exec
command ways until the container has finished the startup.
$ cat myscript-1904newer.sh lxc stop mycontainer lxc delete mycontainer lxc launch ubuntu:19.04 mycontainer lxc exec mycontainer -- bash -c 'while $(systemctl is-system-running &>/dev/null); (($?==1)); do :; done' lxc exec mycontainer -- systemctl is-system-running --wait lxc exec mycontainer -- sudo snap install hello
Note: This script checks the return value of systemctl is-system-running
. When systemd
is not available yet, the return value is 1. When the command returns degraded
, the return value is also 1. Which means, bummer! We can make use of the --wait
parameter but we cannot get a proper foolproof solution without having to resort to some polling of ours. However, in the case of Ubuntu 19.04 or newer, the startup tends to take more time because snapd
has to start as well. Therefore, it is unlikely to hit the case that systemd has completed immediately and reports degraded
(return value 1).
Solution for Ubuntu 16.04 and Ubuntu 18.04 (but also Ubuntu 19.04 and newer)
Use the following example script. You can run it repeatedly in order to verify that it works well. Has been tested with Ubuntu 16.04, Ubuntu 18.04 and Ubuntu 19.04.
$ cat myscript-1804older.sh lxc stop mycontainer lxc delete mycontainer lxc launch ubuntu:18.04 mycontainer lxc exec mycontainer -- bash -c 'while [ "$(systemctl is-system-running 2>/dev/null)" != "running" ] && [ "$(systemctl is-system-running 2>/dev/null)" != "degraded" ]; do :; done' lxc exec mycontainer -- sudo snap install hello
Conclusion
As an overall solution, I suggest to use the last script that does polling. It works on Ubuntu 16.04, Ubuntu 18.04 and Ubuntu 19.04. Here is the line again that waits if the container mycontainer
has not completed the startup yet.
lxc exec mycontainer -- bash -c 'while [ "$(systemctl is-system-running 2>/dev/null)" != "running" ] && [ "$(systemctl is-system-running 2>/dev/null)" != "degraded" ]; do :; done'
4 comments
Skip to comment form
Great post as usual. However, I found another way how to wait for running system, this one is especially useful when cloud-init is used. To wait until cloud-init finish (therefore the container OS is fully prepared) just use:
lxc exec mycontainer — cloud-init status -w
This command will just block-wait on cloud-init to complete.
Author
Thanks!
This works with Ubuntu 16.04, 18.04 and likely 20.04. The output can be suppressed with
2>/dev/null
.The cloud-init in Ubuntu 14.04 does not have the status subcommand, but Ubuntu 14.04 does not have systemd either (for the instructions in this post).
What about using the cloud-init “phone-home” functionality?
Calling “lxc exec mycontainer — cloud-init status -w” from ub18.04 host when initializing an ub18.04 container works well. But from ub20.04 host to ub18.04 container that method seems to wait for extra tens of minutes even though the container has already finished initializing. (I suspect there is an internal timeout to prevent an eternal hang).
I have also used the “phone home” method, but that also requires setting a hole in the firewall, e.g.,
sudo ufw allow from 10.64.64.1/24
when the lxd bridge has address set 10.64.64.1/24, which is a bit troublesome for automating because it requires sudo.
It would be wonderful if lxc provided a call to encapsulate the required wait — that would prevent surprises across variations in host or container versions.
When faced with the choice of phone-home vs polling, I will now return to phone-home rather than code and test branches for polling with different versions of host and container.