How to know when a LXD container has finished starting up

You have just run lxc launch ubuntu:18.04 mycontainer and a new container is being created. The command returns very quickly (around 1-2s) and the container image starts running. The container image may take a few more seconds to complete, so that the init performs all the required tasks.

The problem

The question is, how do you know programmatically when a container’s init has really finished and the startup has been completed?

We will answer first this question, why do we need to know when a running container’s startup has really been completed? We need to know when we write automation scripts. Some commands of the automation script will fail to work if the container has not completed fully the start up. For example, the ubuntu:18.04 container images create a non-root account (username: ubuntu). This account is created near the end of the startup process, therefore if we try to execute commands relating to ubuntu on a container that has not completed the startup, those commands will fail.

Towards a solution

The proper way to solve this issue is to use a feature of the init subsystem of the container image that can tell us when it has completed the startup.

In the case of the Ubuntu 16.04 and newer container images, they use systemd. And there is functionality to report back if a system has completed the startup, by running systemctl is-system-running. When a container is starting up, the state is initializing. As soon as it completed the startup, the state switches into running.

$ systemctl is-system-running
running

Five issues

The first issue is that systemd should have a feature to wait for us instead of we, having to check in a loop (polling) when the state changes. It actually does, and has been added to systemd on August 2019 as systemctl: add support for –wait to is-system-running #9796. Translating this into Ubuntu versions, it means that in Ubuntu 19.04 or newer, we can use --wait as in systemctl is-system-running --wait. Very easy.

The second issue is with Ubuntu versions prior to Ubuntu 19.04, where we need to perform polling. Polling has some complications. Some systemd targets will fail if they are set to run but are not able to complete in a container. Therefore, the end state per systemd will be degraded instead of running. Therefore, when polling, we need to check for either of these two states.

The third issue is that as soon as LXD launches a container, it takes a little bit for systemd to start up and be able to respond to requests for its state. You get the error Failed to connect to bus: No such file or directory if you ask too fast.

The fourth issue is that in newer versions systemd that has the –wait parameter, the command will fail with Failed to connect to bus: No such file or directoryif we run it too soon. Which means that a simple systemctl is-system-running --waitis not sufficient. We need a bit of polling here until systemd is ready to report the state.

The fifth issue is that both the following commands return the same error code 1. systemctl is-system-running when it gives the error Failed to connect to bus: No such file or directory. And systemctl is-system-running when the result is degraded. That means that we need to be careful when we consume the error message through the return value, because the return value is not unique for the error message.

Here is the sequence of states for systemd when it starts in an Ubuntu LXD container. In parenthesis is the number of multiple times I got this message on my test system, until systemd completed the startup (reaching state: degraded).

Failed to connect to bus: No such file or directory      (185 times)
initializing                                             (189 times)                                             
starting                                                 (168 times)
degraded                                                

Now we are ready to put all these together and have a solution for Ubuntu 19.04 (or newer), and a solution for Ubuntu 16.04/18.04.

Solution for Ubuntu 19.04 or newer

The following example script installs a snap package as soon as the container has fully started. The first lxc exec waits until after the systemctl is-system-running command does not return an error. The second lxc exec command ways until the container has finished the startup.

$ cat myscript-1904newer.sh
lxc stop mycontainer
lxc delete mycontainer
lxc launch ubuntu:19.04 mycontainer
lxc exec mycontainer -- bash -c 'while $(systemctl is-system-running &>/dev/null); (($?==1)); do :; done'
lxc exec mycontainer -- systemctl is-system-running --wait
lxc exec mycontainer -- sudo snap install hello

Note: This script checks the return value of systemctl is-system-running. When systemd is not available yet, the return value is 1. When the command returns degraded, the return value is also 1. Which means, bummer! We can make use of the --wait parameter but we cannot get a proper foolproof solution without having to resort to some polling of ours. However, in the case of Ubuntu 19.04 or newer, the startup tends to take more time because snapd has to start as well. Therefore, it is unlikely to hit the case that systemd has completed immediately and reports degraded (return value 1).

Solution for Ubuntu 16.04 and Ubuntu 18.04 (but also Ubuntu 19.04 and newer)

Use the following example script. You can run it repeatedly in order to verify that it works well. Has been tested with Ubuntu 16.04, Ubuntu 18.04 and Ubuntu 19.04.

$ cat myscript-1804older.sh
lxc stop mycontainer
lxc delete mycontainer
lxc launch ubuntu:18.04 mycontainer
lxc exec mycontainer -- bash -c 'while [ "$(systemctl is-system-running 2>/dev/null)" != "running" ] && [ "$(systemctl is-system-running 2>/dev/null)" != "degraded" ]; do :; done'
lxc exec mycontainer -- sudo snap install hello

Conclusion

As an overall solution, I suggest to use the last script that does polling. It works on Ubuntu 16.04, Ubuntu 18.04 and Ubuntu 19.04. Here is the line again that waits if the container mycontainer has not completed the startup yet.

lxc exec mycontainer -- bash -c 'while [ "$(systemctl is-system-running 2>/dev/null)" != "running" ] && [ "$(systemctl is-system-running 2>/dev/null)" != "degraded" ]; do :; done'

Permanent link to this article: https://blog.simos.info/how-to-know-when-a-lxd-container-has-finished-starting-up/

4 comments

Skip to comment form

    • Roman Valo on February 18, 2020 at 15:50
    • Reply

    Great post as usual. However, I found another way how to wait for running system, this one is especially useful when cloud-init is used. To wait until cloud-init finish (therefore the container OS is fully prepared) just use:

    lxc exec mycontainer — cloud-init status -w

    This command will just block-wait on cloud-init to complete.

    1. Thanks!

      This works with Ubuntu 16.04, 18.04 and likely 20.04. The output can be suppressed with 2>/dev/null.
      The cloud-init in Ubuntu 14.04 does not have the status subcommand, but Ubuntu 14.04 does not have systemd either (for the instructions in this post).

    • craig hicks on May 26, 2020 at 17:52
    • Reply

    What about using the cloud-init “phone-home” functionality?

    • Craig Hicks on October 14, 2020 at 10:49
    • Reply

    Calling “lxc exec mycontainer — cloud-init status -w” from ub18.04 host when initializing an ub18.04 container works well. But from ub20.04 host to ub18.04 container that method seems to wait for extra tens of minutes even though the container has already finished initializing. (I suspect there is an internal timeout to prevent an eternal hang).

    I have also used the “phone home” method, but that also requires setting a hole in the firewall, e.g.,

    sudo ufw allow from 10.64.64.1/24

    when the lxd bridge has address set 10.64.64.1/24, which is a bit troublesome for automating because it requires sudo.

    It would be wonderful if lxc provided a call to encapsulate the required wait — that would prevent surprises across variations in host or container versions.

    When faced with the choice of phone-home vs polling, I will now return to phone-home rather than code and test branches for polling with different versions of host and container.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.