A closer look at the new ARM64 Scaleway servers and LXD

Update #1: I posted at the Scaleway Linux kernel discussion thread to add support for the Ubuntu Linux kernel and Add new bootscript with stock Ubuntu Linux kernel #349.

Scaleway has been offering ARM (armv7) cloud servers (baremetal) since 2015 and now they have ARM64 (armv8, from Cavium) cloud servers (through KVM, not baremetal).

But can you run LXD on them? Let’s see.

Launching a new server

We go through the management panel and select to create a new server. At the moment, only the Paris datacenter has availability of ARM64 servers and we select ARM64-2GB.

They use Cavium ThunderX hardware, and those boards have up to 48 cores. You can allocate either 2, 4, or 8 cores, for 2GB, 4GB, and 8GB RAM respectively. KVM is the virtualization platform.

There is an option of either Ubuntu 16.04 or Debian Jessie. We try Ubuntu.

It takes under a minute to provision and boot the server.

Connecting to the server

It runs Linux 4.9.23. Also, the disk is vda, specifically, /dev/vda. That is, there is no partitioning and the filesystem takes over the whole device.

Here is /proc/cpuinfo and uname -a. They are the two cores (from 48) as provided by KVM. The BogoMIPS are really Bogo on these platforms, so do not take them at face value.

Currently, Scaleway does not have their own mirror of the distribution packages but use ports.ubuntu.com. It’s 16ms away (ping time).

Depending on where you are, the ping times for google.com and www.google.com tend to be different. google.com redirects to www.google.com, so it somewhat makes sense that google.com reacts faster. At other locations (different country), could be the other way round.

This is /var/log/auth.log, and already there are some hackers trying to brute-force SSH. They have been trying with username ubnt. Note to self: do not use ubnt as the username for the non-root account.

The default configuration for the SSH server on Scaleway is to allow password authentication. You need to change this at /etc/ssh/sshd_config to look like

# Change to no to disable tunnelled clear text passwords
PasswordAuthentication no

Originally, it was commented out, and had a default yes.

Finally, run

sudo systemctl reload sshd

This will not break your existing SSH session (even restart will not break your existing SSH session, how cool is that?). Now, you can create your non-root account. To get that user to sudo as root, you need to usermod -a -G sudo myusername.

There is a recovery console, accessible through the Web management screen. For this to work, it says that you first need to You must first login and set a password via SSH to use this serial console. In reality, the root account already has a password that has been set, and this password is stored in /root/.pw. It is not known how good this password is, therefore, when you boot a cloud server on Scaleway,

  1. Disable PasswordAuthentication for SSH as shown above and reload the sshd configuration. You are supposed to have already added your SSH public key in the Scaleway Web management screen BEFORE starting the cloud server.
  2. Change the root password so that it is not the one found at /root/.pw. Store somewhere safe that password, because it is needed if you want to connect through the recovery console
  3. Create a non-root user that can sudo and can do PubkeyAuthentication, preferably with username other than this ubnt.

Setting up ZFS support

The Ubuntu Linux kernels at Scaleway do not have ZFS support and you need to compile as a kernel module according to the instructions at https://github.com/scaleway/kernel-tools.

Actually, those instructions are apparently now obsolete with newer versions of the Linux kernel and you need to compile both spl and zfs manually, and install.

Naturally, when you compile spl and zfs, you can create .deb packages that can be installed in a nice and clean way. However, spl and zfs will originally create .rpm packages and then call alien to convert them to .deb packages. Then, we hit on some alien bug (no pun intended) which gives the error: zfs-0.6.5.9-1.aarch64.rpm is for architecture aarch64 ; the package cannot be built on this system which is weird since we are only working on aarch64.

The running Linux kernel on Scaleway for these ARM64 SoC has the following important files, http://mirror.scaleway.com/kernel/aarch64/4.9.23-std-1/

Therefore, run as root the following:

# Determine versions
arch="$(uname -m)"
release="$(uname -r)"
upstream="${release%%-*}"
local="${release#*-}"

# Get kernel sources
mkdir -p /usr/src
wget -O "/usr/src/linux-${upstream}.tar.xz" "https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-${upstream}.tar.xz"
tar xf "/usr/src/linux-${upstream}.tar.xz" -C /usr/src/
ln -fns "/usr/src/linux-${upstream}" /usr/src/linux
ln -fns "/usr/src/linux-${upstream}" "/lib/modules/${release}/build"

# Get the kernel's .config and Module.symvers files
wget -O "/usr/src/linux/.config" "http://mirror.scaleway.com/kernel/${arch}/${release}/config"
wget -O /usr/src/linux/Module.symvers "http://mirror.scaleway.com/kernel/${arch}/${release}/Module.symvers"

# Set the LOCALVERSION to the locally running local version (or edit the file manually)
printf 'CONFIG_LOCALVERSION="%s"\n' "${local:+-$local}" >> /usr/src/linux/.config

# Let's get ready to compile. The following are essential for the kernel module compilation.
apt install -y build-essential
apt install -y libssl-dev
make -C /usr/src/linux prepare modules_prepare

# Now, let's grab the latest spl and zfs (see http://zfsonlinux.org/).
cd /usr/src/
wget https://github.com/zfsonlinux/zfs/releases/download/zfs-0.6.5.9/spl-0.6.5.9.tar.gz
wget https://github.com/zfsonlinux/zfs/releases/download/zfs-0.6.5.9/zfs-0.6.5.9.tar.gz

# Install some dev packages that are needed for spl and zfs,
apt install -y uuid-dev
apt install -y dh-autoreconf
# Let's do spl first
tar xvfa spl-0.6.5.9.tar.gz
cd spl-0.6.5.9/
./autogen.sh
./configure      # Takes about 2 minutes
make             # Takes about 1:10 minutes
make install
cd ..

# Let's do zfs next
cd zfs-0.6.5.9/
tar xvfa zfs-0.6.5.9.tar.gz
./autogen.sh
./configure      # Takes about 6:10 minutes
make             # Takes about 13:20 minutes
make install

# Let's get ZFS loaded
depmod -a
ldconfig
modprobe zfs
zfs list
zpool list

And that’s it! The last two commands will show that there are no datasets or pools available (yet), meaning that it all works.

Setting up LXD

We are going to use a file (with ZFS) as the storage file. Let’s check what space we have left for this (from the 50GB disk),

root@scw-ubuntu-arm64:~# df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda         46G  2.0G   42G   5% /

Initially, it was only 800MB used, now it is 2GB used. Let’s allocate 30GB for LXD.

LXD is not already installed on the Scaleway image (other VPS providers have alread LXD installed). Therefore,

apt install lxd

Then, we can run lxd init. There is a weird situation when you run lxd init. It takes quite some time for this command to show the first questions (choose storage backend, etc). In fact, it takes 1:42 minutes before you are prompted for the first question. When you subsequently run lxd init, you get at once the first question. There is quite some work that lxd init does for the first time, and I did not look into what it is.

root@scw-ubuntu-arm64:~# lxd init
Name of the storage backend to use (dir or zfs) [default=zfs]: 
Create a new ZFS pool (yes/no) [default=yes]? 
Name of the new ZFS pool [default=lxd]: 
Would you like to use an existing block device (yes/no) [default=no]? 
Size in GB of the new loop device (1GB minimum) [default=15]: 30
Would you like LXD to be available over the network (yes/no) [default=no]? 
Do you want to configure the LXD bridge (yes/no) [default=yes]? 
Warning: Stopping lxd.service, but it can still be activated by:
  lxd.socket

LXD has been successfully configured.
root@scw-ubuntu-arm64:~#

Now, let’s run lxc list. This will create first the client certificate. There is quite a bit of cryptography going on, and it takes a lot of time.

ubuntu@scw-ubuntu-arm64:~$ time lxc list
Generating a client certificate. This may take a minute...
If this is your first time using LXD, you should also run: sudo lxd init
To start your first container, try: lxc launch ubuntu:16.04

+------+-------+------+------+------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+-------+------+------+------+-----------+

real    5m25.717s
user    5m25.460s
sys    0m0.372s
ubuntu@scw-ubuntu-arm64:~$

It is weird and warrants closer examination. In any case,

ubuntu@scw-ubuntu-arm64:~$ cat /proc/sys/kernel/random/entropy_avail
2446
ubuntu@scw-ubuntu-arm64:~$

Creating containers

Let’s create a container. We are going to do each step at a time, in order to measure the time it takes to complete.

ubuntu@scw-ubuntu-arm64:~$ time lxc image copy ubuntu:x local:
Image copied successfully!         

real    1m5.151s
user    0m1.244s
sys    0m0.200s
ubuntu@scw-ubuntu-arm64:~$

Out of the 65 seconds, 25 seconds was the time to download the image and the rest (40 seconds) was for initialization before the prompt was returned.

Let’s see how long it takes to launch a container.

ubuntu@scw-ubuntu-arm64:~$ time lxc launch ubuntu:x c1
Creating c1
Starting c1
error: Error calling 'lxd forkstart c1 /var/lib/lxd/containers /var/log/lxd/c1/lxc.conf': err='exit status 1'
  lxc 20170428125239.730 ERROR lxc_apparmor - lsm/apparmor.c:apparmor_process_label_set:220 - If you really want to start this container, set
  lxc 20170428125239.730 ERROR lxc_apparmor - lsm/apparmor.c:apparmor_process_label_set:221 - lxc.aa_allow_incomplete = 1
  lxc 20170428125239.730 ERROR lxc_apparmor - lsm/apparmor.c:apparmor_process_label_set:222 - in your container configuration file
  lxc 20170428125239.730 ERROR lxc_sync - sync.c:__sync_wait:57 - An error occurred in another process (expected sequence number 5)
  lxc 20170428125239.730 ERROR lxc_start - start.c:__lxc_start:1346 - Failed to spawn container "c1".
  lxc 20170428125240.408 ERROR lxc_conf - conf.c:run_buffer:405 - Script exited with status 1.
  lxc 20170428125240.408 ERROR lxc_start - start.c:lxc_fini:546 - Failed to run lxc.hook.post-stop for container "c1".

Try `lxc info --show-log local:c1` for more info

real    0m21.347s
user    0m0.040s
sys    0m0.048s
ubuntu@scw-ubuntu-arm64:~$

What this means, is that somehow the Scaleway Linux kernel does not have all the AppArmor (“aa”) features that LXD requires. And if we want to continue, we must configure that we are OK with this situation.

What features are missing?

ubuntu@scw-ubuntu-arm64:~$ lxc info --show-log local:c1
Name: c1
Remote: unix:/var/lib/lxd/unix.socket
Architecture: aarch64
Created: 2017/04/28 12:52 UTC
Status: Stopped
Type: persistent
Profiles: default

Log:

            lxc 20170428125239.730 WARN     lxc_apparmor - lsm/apparmor.c:apparmor_process_label_set:218 - Incomplete AppArmor support in your kernel
            lxc 20170428125239.730 ERROR    lxc_apparmor - lsm/apparmor.c:apparmor_process_label_set:220 - If you really want to start this container, set
            lxc 20170428125239.730 ERROR    lxc_apparmor - lsm/apparmor.c:apparmor_process_label_set:221 - lxc.aa_allow_incomplete = 1
            lxc 20170428125239.730 ERROR    lxc_apparmor - lsm/apparmor.c:apparmor_process_label_set:222 - in your container configuration file
            lxc 20170428125239.730 ERROR    lxc_sync - sync.c:__sync_wait:57 - An error occurred in another process (expected sequence number 5)
            lxc 20170428125239.730 ERROR    lxc_start - start.c:__lxc_start:1346 - Failed to spawn container "c1".
            lxc 20170428125240.408 ERROR    lxc_conf - conf.c:run_buffer:405 - Script exited with status 1.
            lxc 20170428125240.408 ERROR    lxc_start - start.c:lxc_fini:546 - Failed to run lxc.hook.post-stop for container "c1".
            lxc 20170428125240.409 WARN     lxc_commands - commands.c:lxc_cmd_rsp_recv:172 - Command get_cgroup failed to receive response: Connection reset by peer.
            lxc 20170428125240.409 WARN     lxc_commands - commands.c:lxc_cmd_rsp_recv:172 - Command get_cgroup failed to receive response: Connection reset by peer.

ubuntu@scw-ubuntu-arm64:~$

Two hints here, some issue with process_label_set, and get_cgroup.

Let’s allow for now, and start the container,

ubuntu@scw-ubuntu-arm64:~$ lxc config set c1 raw.lxc 'lxc.aa_allow_incomplete=1'
ubuntu@scw-ubuntu-arm64:~$ time lxc start c1

real    0m0.577s
user    0m0.016s
sys    0m0.012s
ubuntu@scw-ubuntu-arm64:~$ lxc list
+------+---------+------+------+------------+-----------+
| NAME |  STATE  | IPV4 | IPV6 |    TYPE    | SNAPSHOTS |
+------+---------+------+------+------------+-----------+
| c1   | RUNNING |      |      | PERSISTENT | 0         |
+------+---------+------+------+------------+-----------+
ubuntu@scw-ubuntu-arm64:~$ lxc list
+------+---------+-----------------------+------+------------+-----------+
| NAME |  STATE  |         IPV4          | IPV6 |    TYPE    | SNAPSHOTS |
+------+---------+-----------------------+------+------------+-----------+
| c1   | RUNNING | 10.237.125.217 (eth0) |      | PERSISTENT | 0         |
+------+---------+-----------------------+------+------------+-----------+
ubuntu@scw-ubuntu-arm64:~$

Let’s run nginx in the container.

ubuntu@scw-ubuntu-arm64:~$ lxc exec c1 -- sudo --login --user ubuntu
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

ubuntu@c1:~$ sudo apt update
Hit:1 http://ports.ubuntu.com/ubuntu-ports xenial InRelease
...
37 packages can be upgraded. Run 'apt list --upgradable' to see them.
ubuntu@c1:~$ sudo apt install nginx
...
ubuntu@c1:~$ exit
ubuntu@scw-ubuntu-arm64:~$ curl http://10.237.125.217/
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
...
ubuntu@scw-ubuntu-arm64:~$

That’s it! We are running LXD on Scaleway and their new ARM64 servers. The issues should be fixed in order to have a nicer user experience.

4 comments

Leave a Reply

%d bloggers like this: