You are using LXD and you are creating many containers. Those containers are stored in a dedicated ZFS pool, and LXD is managing this ZFS pool exclusively. But disaster strucks, and LXD loses its database and forgets about your containers. Your data is there in the ZFS pool, but LXD has forgotten them because its configuration (database) has been lost.
In this post we see how to recover our containers when the LXD database is, for some reason, gone.
This post expands on the LXD disaster recovery documentation.
How to lose your LXD configuration database
How could you have lost your LXD database?
You have a working installation of LXD and you have uninstalled LXD by accident. Normally, there should be some copy of the database lying around which could make the recovery much easier. In my case, I have been running an instance of LXD from the
edge channel (snap package) and after some time, LXD would get stuck and not work. LXD would not start and the
lxc commands would get stuck without giving any output. Therefore, I switched to the
stable channel (default) and the configuration database was gone.
lxc list would work, but show an empty list.
In this post we cover the case where your storage pool is intact but LXD has forgotten all about your containers, your profiles, your network interfaces, and, of course, your storage pool.
You should get the appropriate output with
zfs list. Like this.
$ sudo zfs list NAME USED AVAIL REFER MOUNTPOINT lxd 78,4G 206G 24K /var/snap/lxd/common/lxd/storage-pools/lxd lxd/containers 73,1G 206G 24K /var/snap/lxd/common/lxd/storage-pools/lxd/containers lxd/containers/mycontainer 486M 206G 816M /var/snap/lxd/common/lxd/storage-pools/lxd/containers/mycontainer ...
lxc commands return empty.
$ lxc storage list +------+-------------+--------+--------+---------+ | NAME | DESCRIPTION | DRIVER | SOURCE | USED BY | +------+-------------+--------+--------+---------+ +------+-------------+--------+--------+---------+
$ lxc profile list +-----------+---------+ | NAME | USED BY | +-----------+---------+ +-----------+---------+
First, LXD lost the connection to the storage pool. There is no information as to where is the ZFS pool. We need to give that information to LXD.
Second, while LXD lost all configuration, each container has a backup of its own configuration in a file
backup.yaml, stored in the storage pool. Therefore, you can
sudo lxd import (Note: it is
lxd import, not
lxc import) to add back each container. If a custom profile, or network interface is missing, you will get an appropriate message to act on it.
How do we recover?
First, we make a list of the container names. It is quite possible you can get the list from
$ ls /var/snap/lxd/common/lxd/storage-pools/lxd/containers/ mycontainer ...
Second, mount each container. We run
zfs mount and specify the ZFS part only. The mount point is somehow known already to ZFS.
$ sudo zfs mount lxd/containers/mycontainer $ zfs mount lxd/containers/mycontainer /var/snap/lxd/common/lxd/storage-pools/lxd/containers/mycontainer
lxd import to import the container. You may get an error; see the troubleshooting section below, and then try to import again.
$ sudo lxd import mycontainer
By doing so, we can now start the container.
$ lxc start mycontainer
Error: Create container: Requested profile ‘gui’ doesn’t exist
You get this error if the profile with name
gui does not exist. Create the profile and run
lxd import again.
Error: Create container: Invalid devices: Not an IP address: localhost
This relates to a change in LXD (appearing in LXD 3.13) and proxy devices. See more at this post.
Error: Storage volume for container “mycontainer” already exists in the database. Set “force” to overwrite
There is already a container in LXD with the same name. Most likely you got this if you already imported the container. Because if not, you need to figure out which one to keep.
I hope I will never have to use this, but I’m really glad you took the time to write this article. Thanks!
Great post, it helped me to solve the “Not an IP address: localhost” issue on one of my older containers which was stopped for a while. For some reason, the good old ‘lxc config edit’ workaround didn’t work, as the changed device configuration seemed to be ignored when trying to save the changed conf. I had to manually mount the container’s storage pool, edit the backup.yaml and import the container using –force.
Thanks for the useful info.
In LXD 3.16 there has been a change that did not allow you to edit the configuration if it was not valid in the first place. This strict change has been relaxed and it should work now.
It is good that you got a workaround using this post.
I have cloned whole rpool with
zfs send ... | ssh root@secondserver zfs recv ...from one server to another.
Everything works fine, except LXD.
systemctl statusshowed that snap/LXD cant’s start. I tried to resolve this problem, but had not succeed. So I’ve reinstalled LXD. Now LXD works, but it’s empty.
So I tried your method, but got:
lxd import mycontainer
Error: The instance's directory "/var/snap/lxd/common/lxd/storage-pools/lxd/containers/mycontainer" appears to be empty. Please ensure that the instance's storage volume is mounted
but it’s not empty:
ls -A /var/snap/lxd/common/lxd/storage-pools/lxd/containers/mycontainer/
backup.yaml metadata.yaml metadata.yaml~ rootfs templates
Is there a solution?
I haven’t tried this and I would need to replicate the whole setup to be able to help you.
I am not able to do this at the moment.
If the original server is still working, then you can enable one of the two LXD servers to be accessible remotely. Then, from the other LXD server you
lxc remote add),
lxc move) to transfer the container.
HI, really thanks for your work. I got that “no pools available” when inputting “zpool list”，do you how to deal it? Thanks!