How to install and use Puppeteer in an Incus container

Incus is a manager for virtual machines and system containers.

A virtual machine is an instance of an operating system that runs on a computer, along with the main operating system. A virtual machine uses hardware virtualization features for the separation from the main operating system.

A system container is an instance of an operating system that also runs on a computer, along with the main operating system. A system container, instead, uses security primitives of the Linux kernel for the separation from the main operating system. You can think of system containers as software virtual machines.

Puppeteer is a Node.js library that is used to programmatically control a Web browser (by default, Chrome/Chromium) over the DevTools Protocol. That is, you can do things like starting a Web browser on a specific page, take a screenshot or export the page to PDF, and then close the browser. All that do not require a desktop environment. Most people use Puppeteer for Web automation, opening a page and clicking elements from that page to get a result. Or, for Web scraping where you load a page and extract the page content.

Puppeteer runs in headless mode by default, but can be configured to run in full (“headful”) Chrome/Chromium or some other browser that supports the DevTool Protocol (Firefox). This also works fine with Incus as long as you create a GUI Incus container.

In this post we see how to install and use Puppeteer in headless mode, in an Incus container. You can set it up on your desktop computer or on your cloud server.

Prerequisites

This post assumes that

  1. You are familiar with using Incus.

Cheat sheet

This is the cheat sheet. You have read the post in its entirety, and at some point in the future you want to repeat the steps in this post. You already know the details but you want an easy support. You have come for the cheat sheet. If this is the first time you are reading this post, skip to the next section.

incus launch images:debian/12/cloud puppeteer
incus exec puppeteer -- sudo --login --user debian
sudo apt update
sudo apt install -y curl
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
logout
incus exec puppeteer -- sudo --login --user debian
nvm install node
npm install puppeteer
sudo apt install -y libnss3 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libpango-1.0-0 libcairo2 libasound2

Creating the Incus container

The following have been tested with both the images:debian/12/cloud and images:ubuntu/22.04/cloud container images. The commands are the same with either the Debian or Ubuntu container images. The only small change is that the Debian cloud container image has a non-root account called debian, while the Ubuntu cloud container has a non-root account called ubuntu. Apart from that, the rest are the same.

The container images have the qualifier cloud in their names. This qualifier means that the container images come with cloud-initpreinstalled, which means that some useful initialization is performed. That qualifier is not required when you only use the headless Chrome in puppeteer. But if you want headfull Chrome while developing so that you can debug your scripts, you’ll need the cloud qualifier.

We create a images:debian/12/cloud instance of that container image. We give the name puppeteer to the container. Then, we get a shell into the instance using the appropriate non-root username. It’s debian for the Debian container images with the cloud qualifier.

$ incus launch images:debian/12/cloud puppeteer
Creating puppeteer
Starting puppeteer                          
$ incus exec puppeteer -- sudo --login --user ubuntu
sudo: unknown user ubuntu
sudo: error initializing audit plugin sudoers_audit
$ incus exec puppeteer -- sudo --login --user debian
debian@puppeteer:~$

Installing NVM and NodeJS

There are several ways to install NodeJS. You would normally not use the packages from the repositories because they often are old. I prefer to use NVM, which is a package manager to install NodeJS. You can specify an exact version, or the LTS version, or if you do not specify, NVM will install the latest (current). You can even switch between versions with nvm list and nvm use nodeversion using the version name from the list.

Once we install NVM, we are instructed to logout and log in again so that the changes to the environment variables are updated.

Once we install NodeJS, we use npm to install puppeteer. Finally, we install required libraries that are needed for the Chrome/Chromium browser that comes with Puppeteer. That binary is a shared binary, and assumes that several libraries already exist on our system. The list has the minimum number of libraries that allow Puppeteer to work.

debian@puppeteer:~$ sudo apt update
...
debian@puppeteer:~$ sudo apt install -y curl
...
debian@puppeteer:~$ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 16555  100 16555    0     0   166k      0 --:--:-- --:--:-- --:--:--  168k
=> Downloading nvm as script to '/home/debian/.nvm'

=> Appending nvm source string to /home/debian/.bashrc
=> Appending bash_completion source string to /home/debian/.bashrc
=> Close and reopen your terminal to start using nvm or run the following to use it now:

export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"  # This loads nvm
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion"  # This loads nvm bash_completion
debian@puppeteer:~$ logout
logout
$ incus exec puppeteer -- sudo --login --user debian
debian@puppeteer:~$ nvm install node
Downloading and installing node v21.6.1...
Downloading https://nodejs.org/dist/v21.6.1/node-v21.6.1-linux-x64.tar.gz...
############################################################################################################################################################################################################ 100.0%
Computing checksum with sha256sum
Checksums matched!
Now using node v21.6.1 (npm v10.2.4)
Creating default alias: default -> node (-> v21.6.1)
debian@puppeteer:~$ npm install puppeteer

added 111 packages in 36s

9 packages are looking for funding
  run `npm fund` for details
npm notice 
npm notice New minor version of npm available! 10.2.4 -> 10.3.0
npm notice Changelog: https://github.com/npm/cli/releases/tag/v10.3.0
npm notice Run npm install -g npm@10.3.0 to update!
npm notice 
debian@puppeteer:~$ sudo apt install -y libnss3 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libpango-1.0-0 libcairo2 libasound2
...
debian@puppeteer:~$

Testing Puppeteer

We have installed NodeJS with NVM, we installed Puppeteer, and installed the required libraries so that the embedded Chrome/Chromium browser can work. Now we test with a simple Puppeteer script, the one that takes a screenshot of a website in headless mode.

When we create the browser object, we use headless: "new". That’s the new way. If you put true, you will get a message that true is deprecated. The script does not print something; it just creates the screenshot of the web page and saves it as example.png. To pull the image on our desktop so that we can open with an image viewer, we open a new terminal window and run from the host incus file pull puppeteer/home/debian/screenshot/example.png .

debian@puppeteer:~$ mkdir screenshot
debian@puppeteer:~$ cd screenshot/
debian@puppeteer:~/screenshot$ cat > index.js
/**
 * @license
 * Copyright 2017 Google Inc.
 * SPDX-License-Identifier: Apache-2.0
 */

'use strict';

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({headless: 'new'});
  const page = await browser.newPage();
  await page.goto('http://example.com');
  await page.screenshot({path: 'example.png'});
  await browser.close();
})();
Ctrl+D
debian@puppeteer:~/screenshot$ node index.js 
debian@puppeteer:~/screenshot$ ls -l
total 2
-rw-r--r-- 1 debian debian 27040 Jan 24 15:33 example.png
-rw-r--r-- 1 debian debian   382 Jan 24 15:33 index.js
debian@puppeteer:~/screenshot$

Conclusion

We have setup Puppeteer into an Incus container and tested that it is working. If you want to switch browser to something like Firefox, you can do so as well. Also, if you want to use the headfull (non-headless) mode, it is possible by adding GUI support to the Incus container.

The file example.png that was generated from the Puppeteer script.

Permanent link to this article: https://blog.simos.info/how-to-install-and-use-puppeteer-in-an-incus-container/

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.