Tag : unicode

post image

How to make a snap package for lolcat with snapcraft on Ubuntu

In this post we are going to see how to make snap installation packages for a program called lolcat.

What you need for this tutorial? This tutorial is similar to Snap a python application. If you first follow that tutorial, then this one will reinforce what you have learned and expand to dealing with more programming languages than just Python.

What will you learn?

  1. How to deal with different source code repositories and get into the process of snapping them in a snap (pun intended).
  2. How to use the Snapcraft plugins for Python, golang, Rust and C/C++.
  3. How to deal with confinement decisions and when to select strict confinement or classic confinement.
  4. How to test the quality of an app before releasing to the Ubuntu Store
  5. What is this truecolor terminal emulator thing, what is lolcat and why it is cool (at least to a select few).

True-color terminal emulators

Terminal emulators like GNOME Terminal support the facility to display text in different colors.

You know this already because by default in Ubuntu you can see filenames in different colors, depending on whether they are executable (green), a directory (blue), a symbolic link (cyan) and so on. If you go 20+ years in the past, even the Linux console supported since then 256 colors.

What changed recently in terms of colors in the terminal emulators, is that newer terminal emulators support 16 million colors; what is described as true-color.

Here is an AWK script (of all scripting languages!) that shows a smooth gradient, from red to green to blue. In old terminal emulators and the Linux console, they were escape sequences to specify the colors, and you had to specify them by name. With the recent improvement among terminal emulators, it is now possible to specify the color by RGB value, thus 256*256*256 ~= 16 million different colors. You can read more about this in the article True Colour (16 million colours) support in various terminal applications and terminals. I found that AWK script there in this article. Try it one your Ubuntu as well!

Now, there is this Unix command called cat, which is used to print the contents of a text file in the terminal. cat /etc/passwd would show the contents of /etc/passwd. So, some people wrote an improved cat, called lolcat, that shows the content of files in a rainbow of colors.

In the screenshot above, the utility lolcat (a version we made and call lolcat-go) is used as a filter to a command (snapcraft –help), and colorizes the text that it receives with a nice rainbow of colors.

In practice, the lolcat utility merely sets the color for each character being printed. It starts with a random color and circles around the rainbow, heading diagonally towards the bottom-right. In the screenshot above, you can see the text snapcraft and Usage: in strong blue, and then diagonally (heading to bottom-right), the colors shift to green and then to orange.

Select which repositories to snap

The first implementation of the lolcat rainbow utility was probably by busyloop, written in Ruby. Since then, several others re-implemented lolcat into more programming languages. In this post we are going to snap:

  1. The Python lolcat by tehmaze.
  2. The golang lolcat by cezarsa.
  3. The Rust lolcat by ur0 (a version that uses Rust’s concurrency feature!)
  4. The C lolcat by jaseg (a version optimized for speed!)

Here we can see a) the URL of the source code, the available b) branches and c) tags. This is useful in the next section when we instruct Snapcraft which version of the source code to use.

In terms of versions among the four implementations, the Python lolcat has a recent tag 0.44, therefore we are using this tag and specify a version 0.44. For the rest, we are using the latest (master) checkout of their repositories, so we use the current date as a version number, in the form YYYYMMDD (for example, 20170226).

Completing the snapcraft.yaml metadata

When creating a snap, we need to write up a nice snap summary (less than 80 characters) and a description (less than 100 words). The summary will look like lolcat-python utility written in Python, and the description lolcat is a utility similar to the Unix “cat” command. lolcat adds rainbow colors to the text output.

In addition, we need suitable names for the four snaps. We are going to use lolcat-python, lolcat-go, lolcat-rust, and lolcat-c for each one of the four snaps.

All in all, here is the information we collected so far:

  1. Python: name is “lolcat-python“, version is “0.44“, summary: lolcat-python utility written in Python, description: lolcat-python is a utility similar to the Unix “cat” command. lolcat-python adds rainbow colors to the text output.
  2. Go: name is “lolcat-go“, version is “20170226“, summary: lolcat-go utility written in Go, description: lolcat-go is a utility similar to the Unix “cat” command. lolcat-go adds rainbow colors to the text output.
  3. Rush: name is “lolcat-rust“, version is “20170226“, summary: lolcat-rust utility written in Rust, description: lolcat-rust is a utility similar to the Unix “cat” command. lolcat-rust adds rainbow colors to the text output.
  4. C: name is “lolcat-c“, version is “20170226“, summary: lolcat-c utility written in C, description: lolcat-c is a utility similar to the Unix “cat” command. lolcat-c adds rainbow colors to the text output.

The final part in the metadata is the grade (whether it is devel or stable) and the confinement (whether it is devmode, classic or strict).

We select stable as the grade because we aim to add these snaps to the Ubuntu Store. If the grade is specified as devel, then the corresponding snap cannot be added to the stable channel (publicly available to all) of the Ubuntu Store. There is a section below on testing the four lolcat implementations and we will set accordingly the grade to either stable or devel, depending on the outcome of these tests.

We select initially devmode (DEVeloper MODE) as the confinement in order not to confine initially out snap. If our snap fails to run, we want to be sure it is some issue with our settings and not a byproduct of the choice of a stricter confinement. Once the snap can be built and run successfully, we change the confinement to either strict or classic and deal at that point with any issues that appear from there on.

Here is how the metadata look in the snapcraft.yaml configuration file for the Python variant of the snap.

Up to this point, all four snapcraft.yaml files, with metadata filled in, can be found at https://github.com/simos/lolcat-snap/tree/v1.0

Working on the “parts” section in snapcraft.yaml

We completed the easy metadata section, now it is time to work on the parts: section of snapcraft.yaml. (After the parts: section work, we just need the apps: section, where we expose the generated executable to the users of the snap).

Initially, snapcraft init will generate a stock snapcraft.yaml file, which has a stub parts: section. Here it how it looks,

parts:
  my-part:
    # See 'snapcraft plugins'
    plugin: nil

It defines the start of the parts: section. Then, a name, my-part: (we choose this name) is defined. Finally, the contents of my-part: are listed, here in two lines. The first line of my-part: is a comment (starts with #) and the second specifies the plugin, which is nil (it is reserved, and does nothing).

Note how these lines are aligned vertically. First is parts: at the start of the line, then two columns further, is my-part:. Then, two more columns further, we get the comment and the plugin: directive, both on the same column. We use spaces and not tabs. If you get any errors later on, check that you follow properly the formatting. You can have the levels separated by more than two columns, if you wish. But make sure that the same level lines are aligned on the same column.

Let’s figure out the initial versions of the parts:. Let’s do Python first!

parts:
  lolcat-python:
    source: https://github.com/tehmaze/lolcat.git
    source-tag: '0.44'
    plugin: python

We used the name lolcat-python: (our choice) for the name of this parts:. In there, we specify the source: for the source code repository, and any branches or tags that may be relevant. As we saw above, we work on the 0.44 tag. Finally, the source is written in Python, and we select the python plugin. (We will figure out later if we need to specify which of Python 2 or Python 3, if we get a relevant error).

Here is the Go,

parts:
  lolcat-go:
    source: https://github.com/cezarsa/glolcat.git
    plugin: go

Fairly expected. We do not specify a version, therefore it will use the latest snapshot of the source at the time of running snapcraft. We chose the name locat-go, and the golang plugin is called go.

Time for Rust,

parts:
  lolcat-rust:
    source: https://github.com/ur0/lolcat.git
    plugin: rust

Again, very similar to the above. We do not specify a specific source code version (there isn’t any tag or stable branch in the repository). We chose the name lolcat-rust, and the Rust plugin in snapcraft is called rust.

Finally, the version written in C,

parts:
  lolcat-c:
    source: https://github.com/jaseg/lolcat.git
    plugin: make

Again, we chose a name for this part, and it’s lolcat-c. Then, specified the URL for the source code. As for the plugin, we wrote make, although the source is written in C. Actually, the plugin: directive specifies how to build the source code. The make plugin does the make; make install sequence for us, and it’s our first choice when we see an existing Makefile file in a repository.

At this stage, we have four initial versions of configuration files. We are ready to try them out and deal with any errors that may appear.

Working on the Python version

Here is the configuration (snapcraft.yaml) for the Python variant up to this point,

name: lolcat-python
version: '0.44'
summary: lolcat utility written in Python
description: |
  lolcat-python is a utility similar to the Unix "cat" command. 
  lolcat-python adds rainbow colors to the text output.
  The source code is available at https://github.com/tehmaze/lolcat

grade: stable
confinement: devmode

parts:
  lolcat-python:
    source: https://github.com/tehmaze/lolcat.git
    source-tag: '0.44'
    plugin: python

Let’s run snapcraft prime. The prime parameter will perform the necessary compilation and will put the result in the prime/ subdirectory. Since snapcraft prime does only the initial compilation steps, it is more convenient to use for repeated tests. Once it works with snapcraft prime, we would be ready in a single extra step to produce the final .snap file (by running snapcraft, no parameters).

$ snapcraft prime
Preparing to pull lolcat-python 
[...]
Pulling lolcat-python 
[...]
Installing collected packages: pip, six, pyparsing, packaging, appdirs, setuptools, wheel
[...]
Successfully downloaded lolcat
Preparing to build lolcat-python 
Building lolcat-python 
[...]
Successfully built lolcat
[...]
Successfully installed lolcat-0.44
Staging lolcat-python 
Priming lolcat-python 
$ _

It worked beautifully! Any result is in the prime/ subdirectory, and we can identify even the executable.

$ ls prime/
bin/  etc/  lib/  meta/  snap/  usr/
$ ls prime/bin
lolcat*
$ _

We are now ready to create the apps: section of the snapcraft.yaml configuration file, in order to expose the executable to the users,

apps:
  lolcat-python:
    command: lolcat

The name: of this snap is lolcat-python, and the name we chose in the apps: section is the same, lolcat-python. Since they are the same, once we install the snap, the newly available command will be called lolcat-python. If there was a difference between the two (for example, name: lolcat-python and apps: lolcat: […], then the command for the users would end up being something like lolcat-python.lolcat, which may not be desirable.

Working on the Go version

Here is the configuration (snapcraft.yaml) for the Go variant up to this point,

name: lolcat-go
version: '20170226'
summary: lolcat utility written in golang
description: |
  lolcat-go is a utility similar to the Unix "cat" command.
  lolcat-go adds rainbow colors to the text output.
  The source code is available at https://github.com/cezarsa/glolcat

grade: devel
confinement: devmode

parts:
  lolcat-go:
    source: https://github.com/cezarsa/glolcat.git
    plugin: go

Let’s run snapcraft prime and see how it goes.

$ snapcraft prime
Preparing to pull lolcat-go 
[...]
Building lolcat-go 
[...]Staging lolcat-go 
Priming lolcat-go 
$ ls prime/bin/
glolcat.git
$ _

Lovely! The go plugin managed to make sense of the repository and compile the Go version of lolcat. The generated executable is called glolcat.git. Therefore, the apps: section looks like

apps:
  lolcat-go:
    command: glolcat.git

In this way, once we install the Go snap of lolcat, there will be an executable lolcat-go exposed to the users. This executable will be the glolcat.git that was generated by the Snapcraft go plugin in prime/bin/. Snapcraft tries to find the final executable in either prime/ or prime/bin/, therefore we do not need to specify specifically bin/glolcat.git.

Working on the Rust version

Here is the configuration (snapcraft.yaml) for the Rust variant up to this point,

name: lolcat-rust
version: '20170226'
summary: lolcat utility written in Rust
description: |
  lolcat-rust is a utility similar to the Unix "cat" command.
  lolcat-rust adds rainbow colors to the text output.
  The source code is available at https://github.com/ur0/lolcat

grade: stable
confinement: devmode

parts:
  lolcat-rust:
    source: https://github.com/ur0/lolcat.git
    plugin: rust

Let’s run snapcraft prime and see how it goes.

$ snapcraft prime
Preparing to pull lolcat-rust 
[...]
Downloading 'rustup.sh'[================] 100%
[...]
Preparing to build lolcat-rust 
Building lolcat-rust 
[...]
Staging lolcat-rust 
Priming lolcat-rust 
$ ls prime/bin/
lolcat
$ _

Without a hitch! The rust plugin noticed that we do not have the Rust compiler, and downloaded it for us! Then, it completed the compilation and the final executable is called lolcat. Therefore, the apps: section looks like

apps:
  lolcat-rust:
    command: lolcat

The apps: section will expose the generated lolcat executable by the name lolcat-rust. Since the name: field is also lolcat-rust, the resulting executable once we install the snap will be lolcat-rust as well. Again, Snapcraft looks by default into both prime/ and prime/bin/ for the command:, therefore it is not required to write specifically command: bin/lolcat.

Working on the C (make) version

Here is the configuration (snapcraft.yaml) for the C (using Makefile) variant up to this point,

name: lolcat-c
version: '20170226'
summary: lolcat utility written in C
description: |
  lolcat-c is a utility similar to the Unix "cat" command.
  lolcat-c adds rainbow colors to the text output.
  The source code is available at https://github.com/jaseg/lolcat

grade: stable
confinement: devmode

parts:
  lolcat-c:
    source: https://github.com/jaseg/lolcat.git
    plugin: make

Let’s run snapcraft prime and see how it goes.

$ snapcraft prime
Preparing to pull lolcat-c 
Pulling lolcat-c 
[...]
Submodule 'memorymapping' (https://github.com/NimbusKit/memorymapping) registered for path 'memorymapping'
Submodule 'musl' (git://git.musl-libc.org/musl) registered for path 'musl'
Cloning into 'memorymapping'...
[...]
Cloning into 'musl'...
[...]
Building lolcat-c 
make -j4
"Using musl at musl"
cd musl; ./configure
[...]sh tools/musl-gcc.specs.sh "/usr/local/musl/include" "/usr/local/musl/lib" "/lib/ld-musl-x86_64.so.1" > lib/musl-gcc.specs
printf '#!/bin/sh\nexec "${REALGCC:-gcc}" "$@" -specs "%s/musl-gcc.specs"\n' "/usr/local/musl/lib" > tools/musl-gcc
chmod +x tools/musl-gcc
make[1]: Leaving directory '/home/myusername/SNAPCRAFT/lolcat-snap/c/parts/lolcat-c/build/musl'
Command '['/bin/sh', '/tmp/tmpetv6irll', 'make', '-j4']' returned non-zero exit status 2
Exit 1
$ _

Well, well. Finally an error!

So, the C version of lolcat was written using the musl libc library. This library was referenced as a git submodule, and snapcraft followed the instructions of the Makefile to pull and compile musl!

Then, for some reason, snapcraft fails to compile lolcat using a musl-gcc wrapper for this newly compiled musl libc. As if we need to specify that the make ; make install sequence should instead be make lolcat; make install. Indeed, looking at the instructions at https://github.com/jaseg/lolcat, this appears to be the case.

Therefore, how do we tell snapcraft that for the make plugin, it should run make lolcat instead of the standard make?

By reading the documentation of the Snapcraft make plugin, we find that

- make-parameters:
 (list of strings)
 Pass the given parameters to the make command.

Therefore, the parts: should look like

parts:
  lolcat-c:
    source: https://github.com/jaseg/lolcat.git
    plugin: make
    make-parameters: [lolcat]

We enclose lolcat in brackets ([lolcat]) because make-parameters accepts a list of strings. If it accepted just a string, we would not need those brackets.

Let’s snap prime again!

$ snapcraft prime
Preparing to pull lolcat-c 
[...]
Building lolcat-c 
make lolcat -j4
"Using musl at musl"
[...]
chmod +x tools/musl-gcc
[...]
gcc -c -std=c11 -Wall -g -Imusl/include -o lolcat.o lolcat.c
[...]
Staging lolcat-c 
Priming lolcat-c 
$ ls prime/bin
ls: cannot access 'prime/bin': No such file or directory
Exit 2
$ ls prime/
censor*  command-lolcat-c.wrapper*  lolcat*  meta/  snap/
$ _

Success! The lolcat executable has been generated, and in fact it is located in prime/ (not in prime/bin/ as in all three previous cases). Also, it took 1 minute and 40 seconds to produce the executable on my computer!

Hmm, there is also a censor executable. I wonder, what does it do?

$ ls -l | ./censor 
█▄█▄█ ██
█▄▄▄▄█▄▄█▄ █ ▄▄▄▄ ▄▄▄▄ █████ ███  ██ ██:██ ▄▄▄▄▄▄
█▄▄▄▄█▄▄█▄ █ ▄▄▄▄ ▄▄▄▄   ███ ███  ██ ██:██ ▄▄▄▄▄▄███▄█▄▄██▄.▄▄▄▄▄▄▄
█▄▄▄▄█▄▄█▄ █ ▄▄▄▄ ▄▄▄▄ █████ ███  ██ ██:██ █▄█▄▄█
█▄▄▄▄▄▄▄█▄ █ ▄▄▄▄ ▄▄▄▄  ████ ███  ██ ██:██ ▄▄█▄
█▄▄▄▄▄▄▄█▄ █ ▄▄▄▄ ▄▄▄▄  ████ ███  ██ ██:██ ▄▄▄▄
$ _

This censor executable is a filter, just like lolcat, and what it does is that it replaces any input letters with block characters. Like censoring the text. Let’s expose both lolcat and censor in this snap! Here is the apps: section,

apps:
  lolcat-c:
    command: lolcat
  censor:
    command: censor

For lolcat, since both the name: and the apps: command are named lolcat-c, the final executable will be lolcat-c. For censor, the executable will be named lolcat-c.censor (composed from the name: and the name we put in apps:).

Which confinement, strict or classic?

When we are about to publish a snap to the Ubuntu Store, we need to decide which confinement to apply, strict or classic.

The strict confinement will by default fully restrict the snap and it is up to us to specify what is allowed. For example, there will be no network connectivity, and we would need to explicitly specify it if we want to provide network connectivity.

The strict confinement can be tricky if the snap needs to read files in general. If the snap were to read file only from the user’s home directory, then that’s easy because we can specify plugs: [home] and be done with it.

In our case, lolcat can be used in two ways,

Note the different colors; the lolcat implementations start every time with a random initial rainbow color

In the first command, lolcat-go requires read access for the file /etc/lsb-release. This would be tricky to specify with the strict confinement, and we would need to use the classic confinement instead.

In the second command, lolcat-go is merely used as a filter. The executable is self-contained and does not read any files whatsoever. Because the cat command does the reading for lolcat.

Therefore, we can select here the strict confinement as long as we promise to use lolcat as a filter only. That is, lolcat would run fully confined! And it will fail if we ask it to read a file like in the first command.

Final configuration files

Here are the four final configuration files! I highlight below what really changes between them (bold+italics).

$ cat python/snap/snapcraft.yaml 
name: lolcat-python
version: '0.44'
summary: lolcat utility written in Python
description: |
  lolcat-python is a utility similar to the Unix "cat" command. 
  lolcat-python adds rainbow colors to the text output.
  The source code is available at https://github.com/tehmaze/lolcat

grade: stable
confinement: strict

apps:
  lolcat-python:
    command: lolcat

parts:
  lolcat-python:
    source: https://github.com/tehmaze/lolcat.git
    source-tag: '0.44'
    plugin: python

$ cat go/snap/snapcraft.yaml 
name: lolcat-go
version: '20170226'
summary: lolcat utility written in golang
description: |
  lolcat-go is a utility similar to the Unix "cat" command.
  lolcat-go adds rainbow colors to the text output.
  The source code is available at https://github.com/cezarsa/glolcat

grade: devel
confinement: strict

apps:
  lolcat-go:
    command: glolcat.git

parts:
  lolcat-go:
    source: https://github.com/cezarsa/glolcat.git
    plugin: go

$ cat rust/snap/snapcraft.yaml 
name: lolcat-rust
version: '20170226'
summary: lolcat utility written in Rust
description: |
  lolcat-rust is a utility similar to the Unix "cat" command.
  lolcat-rust adds rainbow colors to the text output.
  The source code is available at https://github.com/ur0/lolcat

grade: stable
confinement: strict

apps:
  lolcat-rust:
    command: lolcat

parts:
  lolcat-rust:
    source: https://github.com/ur0/lolcat.git
    plugin: rust

$ cat c/snap/snapcraft.yaml 
name: lolcat-c
version: '20170226'
summary: lolcat utility written in C
description: |
  lolcat-c is a utility similar to the Unix "cat" command.
  lolcat-c adds rainbow colors to the text output.
  The source code is available at https://github.com/jaseg/lolcat

grade: stable
confinement: strict

apps:
  lolcat-c:
    command: lolcat
  censor:
    command: censor

parts:
  lolcat-c:
    source: https://github.com/jaseg/lolcat.git
    plugin: make
    make-parameters: [lolcat]
$ _

Quality testing

We have four lolcat snaps available to upload to the Ubuntu Store. Let’s test them first to see which ones are actually good enough to make it today to the Ubuntu Store!

A potential problem with filters like lolcat, is that they may not know how to deal with some UTF-8-encoded strings. In the UTF-8 encoding, text in English like what you read now, is encoded just like ASCII. Problems may occur when you have text in other scripts, where each Unicode character would occupy more than one byte. For example, each character in “Γεια σου κόσμε” is encoded as two bytes in the UTF-8 encoding. Furthermore, there are characters that are stored in Plane 1 of Unicode (Emojis!) which require the maximum of four bytes to encode in UTF-8. Here is an example of emoji characters, “🐧🐨🐩🐷🐸🐹🐺🐼🏴👏” (if they appear as empty boxes, you do not have the proper fonts installed.

What we are seeing here, is that the current implementations of lolcat for Python and Go do not work with non-ASCII encodings. At the moment, they can only colorize text in English.

On the other hand, the Rust and C implementations work fine on all Unicode characters!

There is another issue to watch for. Specifically,

The Unix ls command takes care to understand when you are redirecting the output, and if you do so, by default it does not colorize the output. This is a very useful default for ls because in most cases you want clean text files with no escape sequences for the color. To further our testing, let’s see how well the remaining two lolcat implementations deal with content that happen to already have escape sequences (for example, for color),

We generated a text file (cat-color.txt) with color escape sequences. The Rust implementation choked on them, while the C implementation worked fine!

We have a winner! The C implementation of lolcat will be published in the Ubuntu Store, in the stable channel (available to all). The Rust implementation will be published in the candidate channel (pending a fix when dealing with text that already has escape sequences), while the Python and Go implementations will be published in the edge channels. (Note: if the authors of these lolcat implementations are reading this, my aim here is to demonstrate the different channels. I plan to write a follow-up post on how to fix these issues in the source code and how to re-release all snaps to the stable channel.)

Let’s publish to the Ubuntu Store

The documentation page Publish your snap from snapcraft.io explains the details on publishing a snap on the Ubuntu Store.

In summary, when releasing a snap to the Ubuntu Store, you specify in which channel(s) you want it to be in:

  1. stable channel, for publicly available snaps. These can be searched when running snap find, and will be installed when you run snap install.
  2. candidate channel, for snaps that are very soon to be released to the stable channel. For this and the following channels, you need to specify explicitly the channel when you try to install. Otherwise, you will get the error that the snap is not found.
  3. beta channel, for beta versions of snaps.
  4. edge channel, for testing versions of snaps. This is the furthest away from stable.

If you release a snap in the stable channel, it implies that it will be shown in those below. Or, if you release a snap in the beta channel, it will appear as well in the channel below (edge).

Furthermore, when we were filling in the metadata of the snapcraft.yaml configuration files, there was a field called grade, with options for either stable or devel. If the grade has been specified as devel, then the snap can only be installed in either the beta or edge channels.

So, this is what we are going to do:

  1. The C implementation, lolcat-c, will have grade: stable, and get published in the stable channel of the Ubuntu Store.
  2. The Rust implementation, lolcat-rust, will have grade: devel, and get published in the beta channel of the Ubuntu Store.
  3. The Go and Python implementations, lolcat-go and lolcat-python, will have the grade: devel and get published in the edge channel of the Ubuntu Store.

We have already ran snapcraft login, and logged in with our Ubuntu Single SignOn (SSO) account.

We successfully registered the names on the Ubuntu Store, and then we pushed the snap files. We note down the revision, which is Revision 1 for each one of them.

The snapcraft release command looks like

  snapcraft [options] release <snap-name> <revision> <channel>

The snapcraft release command instructs the Ubuntu Store to release the recently pushed .snap file into the specified channels.

How to install them

Here we install lolcat-c.

First we perform a search for lolcat. The snap lolcat-c is found. There is a feature in the snap find command, that if we were to search for lolcat-c, it would show results for both lolcat and c (shows irrelevant results).

We then run the snap info command to show the Ubuntu Store info for the snap lolcat-c.

Then, we install lolcat-c, by running snap install lolcat-c.

Finally, we run snap info again, and we can see more information since the snap has just been installed. In particular, we can see that there is the additional command lolcat-c.censor. Oh, and then snap is just 24kB in size.

Let’s install the Rust implementation!

The Rust implementation of lolcat can be found in the beta and edge channels. In order to install it, we need to specify the correct channel (here, either beta or edge). And that’s it!

The Go implementation of lolcat is in the edge channel. We install and test it successfully.

Finally, the Python implementation of lolcat, installed from the edge channel.

Hope you enjoyed the post!

Ubuntu Font Beta and Greek

Update: All open bugs for this font at https://bugs.launchpad.net/ubuntufontbetatesting/+bugs File your bug. Currently there bugs relating to Greek, 1. Letter γ ((U03B3) has an untypical style 2.  In letters with YPOGEGRAMMENI, YPOGEGRAMMENI is expected to be under not on the right and 3. Many Greek small letters have untypical style

Here we see some samples of Greek with Ubuntu Font Beta.

Ubuntu Font supports both Greek and Greek Polytonic.

In the following we compare between DejaVu Sans (currently the default font in Ubuntu) and the proposed Ubuntu Font Beta.

Screenshot Waterfall DejaVuSans

This is DejaVu Sans, showing the Greek Unicode Block. This means, modern Greek and Coptic.

Screenshot Waterfall UbuntuBeta Greek

This is Ubuntu Font Beta, showing the Greek Unicode Block. Coptic is not covered as it was not part of the requirements for this version of the font (actually Coptic currently uses a separate new Unicode Block so the Coptic here are too low of a priority).

Screenshot-Waterfall DejaVu Polytonic

This is DejaVu Sans showing the Greek Polytonic Unicode Block coverage. We show the second part of the Unicode Block which has the most exotic characters with up to three accents.

Screenshot Waterfall UbuntuFont Beta Polytonic

Same thing with Ubuntu Font Beta.

Note that those characters that appear as empty boxes are characters that either were not designed by the font designers, or are reserved characters that have not been defined yet.

Antigoni text in DejaVu Sans and Ubuntu Font Beta (PDF, 12pt)

Antigoni text in DejaVu Sans and Ubuntu Font Beta (PDF, 10pt)

If there are things to be fixed, this is the time to do them. Post a comment and we can take if further.

Traditionally, the letters γ and ν tend to have a unique form. In this case, in Ubuntu Font Beta, γ looks different to what a Greek user is accustomed to. I attach an SVG file of γ; if you have suggestions for enhancement, please use Inkscape, this gamma_UbuntuBeta-Regular file and make your suggestion!

(see top of post for link to bug reports)

Avestan keyboard layout

According to Wikipedia,

Avestan (pronounced /əˈvɛstən/ [1]) is an Iranian language known only from its use as the language of Zoroastrian scripture, i.e. the Avesta, from which it derives its name. The language must also at some time have been a spoken language, but how long ago that was is unknown. Its status as a sacred language ensured its continuing use for new compositions long after the language had ceased to be a living language.

Only recently was the Avestan script added to the Unicode standard (Unicode 5.2). For more, see page 17 at the Archaic scripts section of Unicode 5.2 (PDF) and the Unicode block details for U+10B00. See also the proposal to add Avestan to Unicode as an archaic script.

A user from UbuntuForums.org asked for help to create a keyboard layout for the Avestan script.

Keyboard Layout - Avestan

After providing the necessary details, the keyboard layout was created, Avestan keyboard layout for Linux.

So, how can you use the new keyboard layout?

1. Add avestan.txt at the end of /usr/share/X11/xkb/symbols/ir

sudo gedit /usr/share/X11/xkb/symbols/ir

in order to open (as administrator) the ‘ir’ layout, and paste the contents of avestan.txt at the end of the ‘ir’ file. Click Save and exit.

2. Register the new ‘avestan’ layout in evdev.xml and base.xml files.

Both files have a section that looks like the following. Do a simple search for ‘ku_ara’ or some other string in order to find the segment.

        <variant>
          <configItem>
            <name>ku_ara</name>
            <description>Kurdish, Arabic-Latin</description>
            <languageList><iso639Id>kur</iso639Id></languageList>
          </configItem>
        </variant>
-----------HERE------------
      </variantList>
    </layout>
    <layout>
      <configItem>
        <name>iq</name>
        <shortDescription>Irq</shortDescription>
        <description>Iraq</description>
        <languageList><iso639Id>ara</iso639Id>
                      <iso639Id>kur</iso639Id></languageList>
      </configItem>

Open base.xml with

sudo gedit /usr/share/X11/xkb/rules/base.xml

Then open evdev.xml with

sudo gedit /usr/share/X11/xkb/rules/evdev.xml

Replace the ‘———–HERE————‘ with the following lines:

 <variant>
 <configItem>
 <name>avestan</name>
 <description>Avestan</description>
 <languageList><iso639Id>ae</iso639Id></languageList>
 </configItem>
 </variant>

What we do here is we insert a variant description for the ‘avestan’ keyboard layout.

Click Save and exit the text editor.

3. Install a suitable font. Follow the steps from http://www.bomahy.nl/hylke/blog/adding-fonts-in-gnome/
which says to install the font in your home directory, in a ‘.fonts’ subdirectory. Normally, Ubuntu will pick up the font as soon as you copy it in there. Any newly started application should be able to use the new font.

4. Finally, add the new Avestan keyboard layout. Go to System → Preferences → Keyboard → Layouts, click on the [Add…] button and select from the list ‘Iran’ and layout ‘Avestan’. Click OK. Notice the new keyboard layout indicator on the panel that allows you to switch between English and Avestan.

Increasingly more scripts and symbols are added to the Unicode standard. These scripts are not useful unless there is a comfortable way to type in them. Find a script you like and help create a keyboard layout.

Éńĥãǹčīṅǧ·ẗḧë·ẃṛīťıñĝ·ṩụṗṗọṙẗ·ıń·ǦŤḰ+

These are the presentation slides of my talk on improving the writing support in GTK+. It relates to several posts I have in my other (now not used anymore) blog at http://blogs.gnome.org/simos/2008/07/23/guadec-2008-presentation-slides/.

Enhancing the writing support in GTK+

Note: The title may not appear properly because I use a fancy effect that does not support the full range of Unicode characters. It’s a drawback of being trendy. The title says “Éńĥãǹčīṅǧ·ẗḧë·ẃṛīťıñĝ·ṩụṗṗọṙẗ·ıń·ǦŤḰ+”.

Keyboard layout editor UI concept

(click for bigger image)

At the top we select the keyboard layout file, the variant, and set the corresponding verbose name.

The keyboard layout editor shows a standard keyboard, where each keyboard key can show up to four levels. When you select a key, the bottor-left window shows the characters that have been set (here we use four levels). In this bottom-left window we can drag and drop characters (from Unicode blocks) and dead keys that are found from the right of the image. Dead keys are shown in red boxes.

The user is also able to include existing keyboard layout files in the current layout.

At this stage I am thinking how to easily draw the keyboard in a PyGTK application. It would be important not to draw it manually. It would be cool to have a GTK+ keyboard key widget, that you can specify the size, and the text that appears on it, then build a keyboard in Glade. Another option would be to have the basic keyboard as an SVG file (already exists), then draw over it with Cairo. I am inclined for the second option.

Should UI strings in source code have non-ASCII characters?

There is a discussion going on at desktop-devel about whether the UI strings in the source code should also have non-ASCII characters. For example, should typical strings with double-quotes have those fancy Unicode double quotes?

printf(_("Could not find file “%s”n"));

instead of

printf(_("Could not find file "%s"n"));

The general view from the replies is to go ahead and add those nice Unicode characters.

Actually, there are UI messages already with non-ASCII characters (the ellipsis character, …) in GNOME 2.22:

  1. glade3
  2. epiphany

In GNOME 2.24, there are even more (with ellipsis):

  1. gucharmap
  2. epiphany
  3. gnome-terminal
  4. gedit
  5. glade3

Regarding the fancy Unicode double quotes, there are UI strings in GNOME 2.22 (same list for 2.24) in the following packages:

  1. evince
  2. cheese
  3. epiphany
  4. eog
  5. gnome-doc-utils

What are the arguments against having non-ASCII characters in UI strings?

  1. There might be systems that still use 8-bit legacy encodings. In this case, the UTF-8 encoded may not be displayed properly. However, when I tried to demonstrate this on my system (Ubuntu 8.04), I failed miserably. I downloaded a small GTK2 text editor (called tea), I changed a source UI string to include “” and ellipsis, compiled and installed. I then opened a shell, set LANG to POSIX (or C), and ran the text editor. The UI message was proper Unicode and I could even type non-ASCII in the text editor. I resorted to changing a system locale (I picked en_IN) to ISO-8859-1, then logged out. In the login screen it did not show the 8-bit encoding. If someone has a proper legacy 8-bit encoding system with GNOME (OpenBSD, FreeBSD, etc), could you please try it out?
  2. As Alan Cox mentioned in the thread, the canonical way to deal with UI strings in the source code should be to keep as ASCII, and put any fancy Unicode characters in the translation files (even for en_US, get an en_US translation file).

Is GNOME (or components) used in a legacy 7-bit/8-bit environment?

If there is any reason to keep UI strings in the source code as plain ASCII, speak now, or the Unicode flood gates are about to open.

Update 16 May 2008:There is a document at the ISO/IEC 9899 website (C programming language), that mentions the issue of character sets in C. It is http://www.open-std.org/jtc1/sc22/wg14/www/docs/C99RationaleV5.10.pdf.

On page 26, section 5.2.1, it says

The C89 Committee ultimately came to remarkable unanimity on the subject of character set requirements. There was strong sentiment that C should not be tied to ASCII, despite its heritage and despite the precedent of Ada being defined in terms of ASCII. Rather, an implementation is required to provide a unique character code for each of the printable graphics used by C, and for each of the control codes representable by an escape sequence. (No particular graphic representation for any character is prescribed; thus the common Japanese practice of using the glyph “¥” for the C character “” is perfectly legitimate.) Translation and execution environments may have different character sets, but each must meet this requirement in its own way. The goal is to ensure that a conforming implementation can translate a C translator written in C.

For this reason, and for economy of description, source code is described as if it undergoes the same translation as text that is input by the standard library I/O routines: each line is terminated by some newline character regardless of its external representation.

With the concept of multibyte characters, “native” characters could be used in string literals and character constants, but this use was very dependent on the implementation and did not usually work in heterogenous environments. Also, this did not encompass identifiers.

It then goes on with an addition to C99:

A new feature of C99: C99 adds the concept of universal character name (UCN) (see §6.4.3) in order to allow the use of any character in a C source, not just English characters. The primary goal of the Committee was to enable the use of any “native” character in identifiers, string literals and character constants, while retaining the portability objective of C.

Both the C and C++ committees studied this situation, and the adopted solution was to introduce a new notation for UCNs. Its general forms are unnnn and Unnnnnnnn, to designate a given character according to its short name as described by ISO/IEC 10646. Thus, unnnn can be used to designate a Unicode character. This way, programs that must be fully portable may use virtually any character from any script used in the world and still be portable, provided of course that if it prints the character, the execution character set has representation for it.

Of course the notation unnnn, like trigraphs, is not very easy to use in everyday programming; so there is a mapping that links UCN and multibyte characters to enable source programs to stay readable by users while maintaining portability. Given the current state of multibyte encodings,
10 this mapping is specified to be implementation-defined; but an implementation can provide the users with utility programs that do the conversion from UCNs to “native” multibytes or vice versa, thus providing a way to exchange source files between implementations using the UCN notation.

Update 7 Aug 2008: According to PEP 8, Style Guide for Python Code, under Encodings, says

    For Python 3.0 and beyond, the following policy is prescribed for
    the standard library (see PEP 3131): All identifiers in the Python
    standard library MUST use ASCII-only identifiers, and SHOULD use
    English words wherever feasible (in many cases, abbreviations and
    technical terms are used which aren't English). In addition,
    string literals and comments must also be in ASCII. The only
    exceptions are (a) test cases testing the non-ASCII features, and
    (b) names of authors. Authors whose names are not based on the
    latin alphabet MUST provide a latin transliteration of their
    names.

    Open source projects with a global audience are encouraged to
    adopt a similar policy.

(Emphasis mine)

How to easily modify a program in Ubuntu (updated)?

Some time ago we talked about how to modify easily a program in Ubuntu. We gave as an example the modification of gucharmap; we got the deb source package, made the change, compiled, created new .deb files and installed them.

We go the same (well, similar) route here, by modifying the gtk+ library (!!!). The purpose of the modification is to allow us to type, by default, all sort of interesting Unicode characters, including ⓣⓗⓘⓢ , ᾅᾷ, ṩ, and many more.

The result of this exercise is to create replacement .deb packages for the gtk+ library that we are going to install in place of the system libraries. Because these new libraries will not be original Ubuntu packages, the update manager will be pestering us to rollback to the official gtk+ packages. This is actually good in case you want to switch back; you will have the enhanced functionality for as long as you postpone that update.

There is a chance we might screw up our system, so please make backups, or have a few drinks first and come back. I take no responsibility if something bad happens on your system. If you are having any second thoughts, do not follow the next steps; use the safer alternative procedure. You may try however this guide just for the kicks; up to the dpkg command below, no changes are being made to your system.

We use Ubuntu 7.10 here. This should work in other versions, though your mileage may vary.

The compilation procedure takes time (about 30 minutes) and space. Make sure you use a partition with >2GB of free space. We are not going to use up 2GB (a bit less than 1GB), but it’s nice not to fill up partitions.

We are going to use the generic instructions on how to recompile a debian package by ducea.

First of all, install the development packages,

sudo apt-get install devscripts build-essential

Next, we use the apt-get source command to get the source code of the GTK+ 2 library,

cd /home/ubuntu/bigpartition_over2GB/
apt-get source libgtk2.0-0

We then pull in any dependencies that GTK+ may require. They are normally about a dozen packages, but we do not have to worry for the details.

apt-get build-dep libgtk2.0-0

At this stage we need to touch up the source code of GTK+ before we go into the compilation phase. Visit the bug report #321896 – Synch gdkkeysyms.h/gtkimcontextsimple.c with X.org 6.9/7.0 and download the patch (look under the Attachment section). You should get a file named gtk-compose-update.patch. If you have a look at the patch, you will notice that it expects to find the source of gtk+ in a directory called gtk+. Making a link solves the problem,

ln -s libgtk2.0-0 gtk+

We then attempt to apply the patch (perform a dry run), just in case.

patch -p0 --dry-run < /tmp/gtk-compose-update.patch

If this does not show an error message, you can the command again without the –dry-run.

patch -p0 < /tmp/gtk-compose-update.patch

Finally, we are ready to build our fresh GTK+ library.

cd libgtk2.0-0
debuild -us -uc

This will take time to complete, so go and do some healthy cooking.

At the end of the compilation, if all went OK, you should have about a dozen .deb files created. These are one directory higher (do a “cd ..“). To install, use dpkg,

dpkg -i *.deb

If you have any other deb files in this directory, it’s good to move them away before running the command. If all went ok, the .deb files should install without a hitch.

The final step is to restart your system. To test the new support, see the last section at this post. Use Firefox and OpenOffice.org to type those Unicode characters.

If you managed to wade through all these steps, I would appreciate it if you could post a comment.

Good luck!

Testing the updated IM support in GTK+

In Improving input method support in GTK+-based apps, we talked about some work to update the list of compose sequences that GTK+ knows to the latest version that comes from Xorg. From 691 compose sequences, we now support over 5000.

The patch has landed in GTK+ (trunk), and here are instructions for testing.

  1. If you have not used jhbuild before, read the jhbuild instructions and install it.
  2. Add the following to your ~/.jhbuildrc file
    branches['gtk+'] = None    # Makes sure you build from the trunk of GTK+
  3. Install gtk+ using the command (see the comment of James on this post on how to avoid Step 5 below)
    jhbuild build gtk+
  4. About 40 minutes later, and about 700MB of space (~600MB for source, ~100MB for installation of files) consumed, you should get a working copy of GTK+ 2.12.
  5. You can use this compiled version of GTK+ by running
    jhbuild shell

    This should give you a new shell, and whatever you run from here will use our fresh GTK+. Try running “gedit”. You will notice that the theme is different; it uses the default theme due to the special GTK+. This shell has set special environment variables so that program that run will use the fresh GTK+. The rest of the libraries come from our distribution.

  6. If you try to type compose sequences, you will notice no improvement. This is because at the moment jhbuild builds the branch 2.12 of GTK+ and not trunk. We need to download GTK+ from trunk and rebuild.
    cd ~/checkout/gnome2/
    mv gtk+ gtk+-branch-2.12
    svn co svn://svn.gnome.org/svn/gtk+/trunk gtk+
    jhbuild build --no-network gtk+
  7. Perform Step 4 and get gedit running.

How to test?

  • Setup a keyboard layout that supports a good variety of dead keys. My preference is GBr (United Kingdom). Here, AltGr+[];’#/ and AltGr+{}:@~? produce different dead keys. You press one of these combinations and then you press a letter. If such a combination exists, then it gets printed. For example, the old GTK+ produces öõóôòx åōőxxx. The new GTK+ produces öõóôòọ åōőǒŏȯ (12 dead keys).
  • Setup Greek, Polytonic (Ancient Greek). The dead keys are [];’ {}:@ AltGr+[] (10 dead keys). Produce characters such as ᾅᾂᾷῗὕὒᾥᾢῷ.
  • Try compose sequences as described from the upstream file at XOrg. For example,
    ComposeKey+(   1 0 )  produces ⑩. Try the same for 0-20, a-zA-Z.
  • Other miscellaneous, Ṩǟấẫǡ (using GBr layout)

The next step would be to parse the list of compose sequences and produce a documentation file.

Keyboard layout for combining diacritics

Typically, if you want to type characters with accents, such as á, ë, ś, you need to configure a suitable keyboard layout that includes compose sequences for those characters. The produced characters are what we call as precomposed characters; which were included in the early stages of Unicode. Nowdays, the idea is that you do not need to define á as a distinct character because it can be represented as a and ´, where the latter is a combining diacritic.

When put together a character and a combining diacritic, they fuse together, producing a seemingly single character. á is a precomposed (really one character), while á is letter a and the combining diacritic called acute (two characters). You can type the latter á by

  1. Type a
  2. Press Ctrl+Shift+u, then type 301, then press space bar.

Western languages do not really require combining marks, so the existing keyboard layouts do not use them. Other scripts, such as the Congolese keyboard layout (based on Latin) make good use of them.

Gedit, pango and combining diacritics

This is gedit showing off pango and DejaVu fonts (default font in major distributions).

Line 3 is a bit of an extreme, showing a sandwich of combining diacritics.

Line 4 shows the base character a with the combining diacritics from the Unicode range 0x300 to 0x315.

Both lines 3 and 4 were produced easily with a modified keyboard layout, which is show below.

Line 5 is just me being silly. You can have combining diacritics that enclose your base character.

$ cat  /usr/share/X11/xkb/symbols/combining
partial alphanumeric_keys alternate_group
xkb_symbols "combining" {

    name[Group1] = "Combining diacritics";

    key.type[Group1] = "FOUR_LEVEL";

    key <AD11> { [ NoSymbol, NoSymbol, 0x1000300, 0x1000301 ] }; // à   á
    key <AD12> { [ NoSymbol, NoSymbol, 0x1000302, 0x1000303 ] }; // â   ã

    key <AC10> { [ NoSymbol, NoSymbol, 0x1000304, 0x1000305 ] }; // ā   a̅
    key <AC11> { [ NoSymbol, NoSymbol, 0x1000306, 0x1000307 ] }; // ă   ȧ
    key <BKSL> { [ NoSymbol, NoSymbol, 0x1000308, 0x1000309 ] }; // ä    ả

    key <AB08> { [ NoSymbol, NoSymbol, 0x1000310, 0x1000311 ] }; // a̐     ȃ
    key <AB09> { [ NoSymbol, NoSymbol, 0x1000312, 0x1000313 ] }; // a̒     a̓
    key <AB10> { [ NoSymbol, NoSymbol, 0x1000314, 0x1000315 ] }; // a̔     a̕
};
$ diff -u /usr/share/X11/xkb/symbols/us.ORIGINAL /usr/share/X11/xkb/symbols/us
--- /usr/share/X11/xkb/symbols/us.ORIGINAL      2008-02-20 11:11:13.000000000 +0000
+++ /usr/share/X11/xkb/symbols/us       2008-02-20 13:02:07.000000000 +0000
@@ -492,3 +492,12 @@
     name[Group1]= "U.S. English - Macintosh";
 };

+partial alphanumeric_keys modifier_keys
+xkb_symbols "combining_us" {
+
+    include "us"
+    include "combining"
+
+    key.type[Group1] = "FOUR_LEVEL";
+    name[Group1] = "U.S. English - Combining";
+};
$ diff -u /usr/share/X11/xkb/rules/xorg.xml.ORIGINAL /usr/share/X11/xkb/rules/xorg.xml
--- /usr/share/X11/xkb/rules/xorg.xml.ORIGINAL  2008-02-20 11:27:00.000000000 +0000
+++ /usr/share/X11/xkb/rules/xorg.xml   2008-02-20 11:27:48.000000000 +0000
@@ -3643,6 +3643,12 @@
             <description xml:lang="zh_TW">Macintosh</description>
           </configItem>
         </variant>
+        <variant>
+          <configItem>
+            <name>combining_us</name>
+            <description>Combining</description>
+          </configItem>
+        </variant>
       </variantList>
     </layout>
     <layout>
$ _

Then, you select this keyboard layout (U.S. English) and variant (Combining) in the Keyboard Indicator applet.

Unlike dead keys, with combining diacritics you first type the base character (such as a) and then any combining diacritics.
Our sample layout variant puts the diacritics in the physical keys for [];’#,./. For example,

  • a + AltGr+[ : à
  • a + AltGr+Shift+[ : á
  • a + AltGr+[ + AltGr+’ : ằ

If your language has needs that can be solved with combining diacritics, this is how they are solved.

It is quite important to create keyboard layouts for all languages, and actually make good use of them.

Typing squiggles and dots in GNOME and GTK+ applications

Garrett asks how to type squiggles and dots in GNOME; that is, how to type characters such as á à ä ã â ą ȩ ę ő ǰ ǩ ǒ ġ ṅ ȯ ṁ ė.

There are several ways, and one can choose depending on how frequently they need to type them or how much time they need to invest learning.

① One option is to start the Character Map (Applications/Accessories/Character Map), pick the character, copy and paste it. This is good for rare characters and weird situations such as

┏━━━━━━━━━━━━━━━━━━━━━━━┓

⟁⟁⟁⟁♥♀★★▶◀☆♀░░░▒▒▒▓▓▓▙▚▛▙▙▙▞

The Unicode standard, apart from defining characters for languages, it also defines symbols, dingbats and all sort of things. If your distribution is based on the DejaVu fonts (such as Ubuntu), then you are probably covered for many of these symbols. If you do not have a suitable font, or you use Windows, you will be wondering what the hell I am talking about.

② Another option is to use the Character Palette applet which shows an applet on the panel with a configurable small repertoire of characters such as áàéíñó½©ث€. You select one of the characters with the mouse, and wherever you middle-click, this character is typed. This is an improvement over ①, and good when you want to type often rare characters. It is not convenient to type characters found normally on a keyboard layout.

③ To type characters normally found in a specific language(s), it is good to setup a suitable keyboard layout. For this, it is good to add the Keyboard Indicator applet; right click on the panel, click Add to panel… and choose the Keyboard Indicator from the Utilities section. The US English keyboard layout (Default variant) does not provide any interesting characters apart from those shown printed on the keys of a US Keyboard.
Keyboard layout US Intl with dead keys
The US English International (with dead keys) variant might be a better option,

Keyboard layout GB

Or the United Kingdom layout.

You can get a similar image for your layout when you right-click on the Keyboard Indicator applet, then click Show Current Layout.

Each key in the images contain up to four letters. Starting from bottom-left and going clock-wise, these are the keys produced when

ⓐ you press the key

ⓑ you press the key with Shift (or Caps Lock)

ⓒ you press the key with AltGr and Shift (or Caps Lock)

ⓓ you press the key with AltGr

For example, with the UK keyboard layout, the key G produces g, G, Ŋ, ŋ.

If AltGr + Shift + letter does not work for you, see the FDO Bug #2871 Different results for shift-altgr and altgr-shift.

Using the appropriate keyboard layout is the way to go when writing text that require squiggles. You can either choose a layout with dead keys (meaning that some keys lose their normal functionality), or you can pick a layout that still allows you to have dead keys but are available when you press AltGr + key. For example, in the UK Keyboard layout – Default variant, AltGr + ; + a produces á, or AltGr+Shift+]+e produces ē.

OLPC showing the keyboard

Photo by titanas.

The OLPC uses those four level for the keyboard layout. You can see the all the variations printed on the keyboard. Click on the image, choose Large size for the details.

④ Another option to produce more characters on the keyboard is to enable the compose key, and use compose sequences. A compose sequence looks similar to what we described above (i.e. AltGr+Shift+]+e to ē) but the idea is that we use it for characters we want to be available across different keyboard layouts that you may have enabled.
Configuring the compose key
The compose key is very powerful functionality, thus it is not enabled by default, and lays hidden in the Layout Options tab. I prefer to set it to Menu, but every person has their own preference.

For example,

  • Compose key + – + a produces ã,
  • Compose key + < + c produces č
  • Compose key + 1 + s produces ¹ (Superscript on 1. Try to replace 1 with 2.)
  • Compose key + + + – procudes ±

Currently, GTK+ provides 640 such compose sequences involving the Compose key, and hopefully soon it will increase to over 3000.

The Compose key is known as Multi_key in the source code (Xorg, GTK+, etc).

The Compose key compose sequences offer the ability to define smart mnemonics on how to produce characters. It is much easier to type ComposeKey + 1 + s rather than remembering the codepoint value of ¹ (1 superscript). As with many things open-source, there are too many options, and with the Compose key there is the issue of which shall we pick as a sensible default, and how to make it prominent for those who might want to use it.

It appears to me that there should be more effort to promote the functionality that is provided with the standard keyboard layouts (choose a better keyboard layout, produce characters provided in the third and fourth levels, etc). In this respect, Compose key compose sequences should complement after the main discussion on keyboard layouts take place.

⑤ There is a last issue on switching keyboard layouts to cover in a separate post.

Improving input method support in GTK+-based apps

When a bug report gets long with many comments, it gets more difficult for someone to get the full picture of what is going on. I’ll attempt to summarise here what’s being said in Bug 321896, Synch gdkkeysyms.h / gtkimcontextsimple.c with X.org 6.9/7.0.

GTK+-based applications use by default the GTK+ Input Method in order to let users type in different languages. Some scripts are very complex (such as SE Asian scripts) and in this case SCIM is used, replacing the GTK+ Input Method. One can even disable GTK+ IM altogether and use the basic X Input Method (XIM) which is provided by the Xorg server, by setting GTK_IM_MODULE to xim. However, the majority of the users have GTK+ IM enabled.

Between GTK+ IM and XIM, the keyboard layouts are being managed by the xkeyboard-config project and Sergey Udaltsov. A keyboard layout is simply a mapping of keyboard keys to Unicode characters, but you can also have compose sequences for some characters using what we call dead keys. When you press a dead key nothing appears on screen but when you press a letter immediately afterwards, you can get an á. This functionality is common to add accents, and there is a big table for these compose sequences (1.3MB) and what Unicode characters they produce.

If you change your keyboard layout (System/Preferences/Keyboard/Layout) to something like U.S. English International (with dead keys), then the ‘ key on your keyboard becomes dead_acute, and the compose sequence

<dead_acute> <a>  : "á"   U00E1 # LATIN SMALL LETTER A WITH ACUTE

works when you press and then a.

There is an issue with compose sequences and input methods; XIM maintains the official upstream version of the compose sequences, and projects such as GTK+ and SCIM carry their own copies of that table.

The issue with GTK+ regarding the compose sequences is that it has a very old version compared to what is available upstream. This is what Bug 321896 is about.

The bug would be have been resolved much much earlier if it wasn’t for the insistence of the GTK+ maintainers to cut the fat and reduce the size of the table (~6000 entries) with clever optimisations.

Tor suggested a clever optimisation; a good number of compose sequences (which looks like <dead_acute> <a> : “á”) resemble the decomposed form (a la Unicode) of those characters. Thus, we can let the user type what she wants, and we can try Unicode normalisation to see if the sequence is composed to a single Unicode character. Lets demonstrate in Python,

$ python
>>> import unicodedata
>>> sequence=[65, 0x301]     # That's 'a' and acute
>>> result = unicodedata.normalize('NFC',"".join(map(unichr, sequence)))
>>> result
u'\xc1'
>>> print len(result)
1
>>> print result
Á

That long line above takes the array, applies the unichr() function on each member so that they become Unicode characters and then joins them in a single string. Finally, it normalises the (decomposed) string to a single character. The fact that the resulting string has length 1 (single character) is key to this optimisation. Over 1000 compose sequences can be removed from the compose table through this optimisation. This includes a big chunk of the Latin Unicode blocks, about a few dozens of Cyrillic characters, all of modern Greek and Greek polytonic, some Indic languages (are they actually used?) and other misc sequences.

Matthias laid out the requirements for the optimisation of the remaining compose sequences; ① it has to be static const so a single copy is shared all over the place, ② the first column (out of six) is repeated too often, thus use subtables, and ③ each row ends with a varying number of zeroes, so cut on those zeroes as well. This also required the automatic generation of the optimised table using a script.

The work has not finished yet, and requires testing of the patch. The high priority testing is that keyboard layouts do not get any regressions (that is, compose sequences with dead keys must continue to work along with any new sequences).

With an updated compose table in GTK+, one can write things like ⒼⓃⓄⓂⒺ and all variations of accents on characters, in an easier way.

I’ld like to thank Matthias and Tor for their support in this work. And Jeff for adding this blog to Planet GNOME!

StixFonts, finally available (beta)!

The STIX Fonts project (website) has been developing for over 10 years a font suitable to be used in academic publications. It boasts support from Elsevier, IEEE and other academic publishers or associations.

A few days ago, they published a beta version of the font in an effort to get public feedback. The beta period runs until the 15th December.

STIX Fonts Beta showing Greek (Regular), from STIX Fonts Beta

STIX Fonts Beta currently support modern Greek. An effort to get support for Greek Polytonic did not work out well a few years back.

STIX Fonts Beta showing Greek (Italic), from STIX Fonts Beta

The main benefit of STIX Fonts is the support for mathematical and other technical symbols. This helps when writing academic publications and other technical documents.

STIX Fonts Beta showing Greek (Bold), from STIX Fonts Beta

STIX Fonts have extensive support of mathematical symbols, symbols that exist in Unicode Plane-1.

STIX Fonts Beta showing Greek (Bold Italic), from STIX Fonts Beta

If there is any modification that we would like to have in STIX fonts, we should do now. Once they are released, they will be widely distributed. Currently, Fedora has packaged STIX Fonts and made them available already.