You are using LXD and you are creating many containers. Those containers are stored in a dedicated ZFS pool, and LXD is managing this ZFS pool exclusively. But disaster strucks, and LXD loses its database and forgets about your containers. Your data is there in the ZFS pool, but LXD has forgotten them because its configuration (database) has been lost.
In this post we see how to recover our containers when the LXD database is, for some reason, gone.
This post expands on the LXD disaster recovery documentation.
How to lose your LXD configuration database
How could you have lost your LXD database?
You have a working installation of LXD and you have uninstalled LXD by accident. Normally, there should be some copy of the database lying around which could make the recovery much easier. In my case, I have been running an instance of LXD from the
edge channel (snap package) and after some time, LXD would get stuck and not work. LXD would not start and the
lxc commands would get stuck without giving any output. Therefore, I switched to the
stable channel (default) and the configuration database was gone.
lxc list would work, but show an empty list.
In this post we cover the case where your storage pool is intact but LXD has forgotten all about your containers, your profiles, your network interfaces, and, of course, your storage pool.
You should get the appropriate output with
zfs list. Like this.
$ sudo zfs list NAME USED AVAIL REFER MOUNTPOINT lxd 78,4G 206G 24K /var/snap/lxd/common/lxd/storage-pools/lxd lxd/containers 73,1G 206G 24K /var/snap/lxd/common/lxd/storage-pools/lxd/containers lxd/containers/mycontainer 486M 206G 816M /var/snap/lxd/common/lxd/storage-pools/lxd/containers/mycontainer ...
lxc commands return empty.
$ lxc storage list +------+-------------+--------+--------+---------+ | NAME | DESCRIPTION | DRIVER | SOURCE | USED BY | +------+-------------+--------+--------+---------+ +------+-------------+--------+--------+---------+
$ lxc profile list +-----------+---------+ | NAME | USED BY | +-----------+---------+ +-----------+---------+
First, LXD lost the connection to the storage pool. There is no information as to where is the ZFS pool. We need to give that information to LXD.
Second, while LXD lost all configuration, each container has a backup of its own configuration in a file
backup.yaml, stored in the storage pool. Therefore, you can
sudo lxd import (Note: it is
lxd import, not
lxc import) to add back each container. If a custom profile, or network interface is missing, you will get an appropriate message to act on it.
How do we recover?
First, we make a list of the container names. It is quite possible you can get the list from
$ ls /var/snap/lxd/common/lxd/storage-pools/lxd/containers/ mycontainer ...
Second, mount each container. We run
zfs mount and specify the ZFS part only. The mount point is somehow known already to ZFS.
$ sudo zfs mount lxd/containers/mycontainer $ zfs mount lxd/containers/mycontainer /var/snap/lxd/common/lxd/storage-pools/lxd/containers/mycontainer
lxd import to import the container. You may get an error; see the troubleshooting section below, and then try to import again.
$ sudo lxd import mycontainer
By doing so, we can now start the container.
$ lxc start mycontainer
Error: Create container: Requested profile ‘gui’ doesn’t exist
You get this error if the profile with name
gui does not exist. Create the profile and run
lxd import again.
Error: Create container: Invalid devices: Not an IP address: localhost
This relates to a change in LXD (appearing in LXD 3.13) and proxy devices. See more at this post.
Error: Storage volume for container “mycontainer” already exists in the database. Set “force” to overwrite
There is already a container in LXD with the same name. Most likely you got this if you already imported the container. Because if not, you need to figure out which one to keep.