Tag : keyboard

Avestan keyboard layout

According to Wikipedia,

Avestan (pronounced /əˈvɛstən/ [1]) is an Iranian language known only from its use as the language of Zoroastrian scripture, i.e. the Avesta, from which it derives its name. The language must also at some time have been a spoken language, but how long ago that was is unknown. Its status as a sacred language ensured its continuing use for new compositions long after the language had ceased to be a living language.

Only recently was the Avestan script added to the Unicode standard (Unicode 5.2). For more, see page 17 at the Archaic scripts section of Unicode 5.2 (PDF) and the Unicode block details for U+10B00. See also the proposal to add Avestan to Unicode as an archaic script.

A user from UbuntuForums.org asked for help to create a keyboard layout for the Avestan script.

Keyboard Layout - Avestan

After providing the necessary details, the keyboard layout was created, Avestan keyboard layout for Linux.

So, how can you use the new keyboard layout?

1. Add avestan.txt at the end of /usr/share/X11/xkb/symbols/ir

sudo gedit /usr/share/X11/xkb/symbols/ir

in order to open (as administrator) the ‘ir’ layout, and paste the contents of avestan.txt at the end of the ‘ir’ file. Click Save and exit.

2. Register the new ‘avestan’ layout in evdev.xml and base.xml files.

Both files have a section that looks like the following. Do a simple search for ‘ku_ara’ or some other string in order to find the segment.

            <description>Kurdish, Arabic-Latin</description>

Open base.xml with

sudo gedit /usr/share/X11/xkb/rules/base.xml

Then open evdev.xml with

sudo gedit /usr/share/X11/xkb/rules/evdev.xml

Replace the ‘———–HERE————‘ with the following lines:


What we do here is we insert a variant description for the ‘avestan’ keyboard layout.

Click Save and exit the text editor.

3. Install a suitable font. Follow the steps from http://www.bomahy.nl/hylke/blog/adding-fonts-in-gnome/
which says to install the font in your home directory, in a ‘.fonts’ subdirectory. Normally, Ubuntu will pick up the font as soon as you copy it in there. Any newly started application should be able to use the new font.

4. Finally, add the new Avestan keyboard layout. Go to System → Preferences → Keyboard → Layouts, click on the [Add…] button and select from the list ‘Iran’ and layout ‘Avestan’. Click OK. Notice the new keyboard layout indicator on the panel that allows you to switch between English and Avestan.

Increasingly more scripts and symbols are added to the Unicode standard. These scripts are not useful unless there is a comfortable way to type in them. Find a script you like and help create a keyboard layout.

Εναλλακτικές διατάξεις πληκτρολογίου για ελληνικά

Ο Νίκος Μουτσανάς έφτιαξε δύο εναλλακτικές διατάξεις πληκτρολογίου που μπορεί να φανούν χρήσιμες σε όσους

  1. γράφουν συχνά σε διάφορες λατινικές (ευρωπαϊκές) διατάξεις πληκτρολογίου και θέλουν μια κοινή συγκεντρωτική διάταξη
  2. θέλουν μια ελληνική διάταξη που να επιτρέπει και αγγλικά με τη χρήση του AltGr (δηλαδή μια διάταξη για ελληνικά και αγγλικά)

Είναι πιθανό κάποιες από τις παραπάνω εκδοχές να μπορούν να υλοποιηθούν με χρήση παραμέτρων στη ρύθμιση πληκτρολογίου.

Ωστόσο, για τώρα, δείτε τις οδηγίες του Νίκου για τη ρύθμιση των δύο εναλλακτικών διατάξεων πληκτρολογίου. Διατηρήστε αντίγραφα ασφαλείας των αρχείων που θα τροποποιήσετε, σε περίπτωση που θέλετε επαναφέρετε τις προηγούμενες ρυθμίσεις. Δυστυχώς δεν υπάρχει ακόμα κάποιο σύστημα όπου να μπορούμε να προσθέτουμε εύκολα διατάξεις πληκτρολογίου στη διανομή μας.

The Keyboard Layout Editor

Update Dec 2010: Get the latest version of the Keyboard Layout Editor from https://github.com/simos/keyboardlayouteditor

(this entry is a repost, the original was lost in a database mishap.)

As part of the 2008 GSoC program, I worked on a Keyboard Layout Editor for the X.Org Foundation.

The Keyboard Layout Editor (KLE) is an application that allows you to create keyboard layouts for the X.Org server, commonly found in the Linux, OpenSolaris, *BSD, etc Desktops.

My mentor was Sergey Udaltsov, maintainer of xkeyboard-config, the Keyboard Indicator applet in GNOME, supporting libraries for keyboard layouts and much more. I had great help and Sergey was very supportive. Highly recommended mentor for your GSoC’09 project.

The Keyboard Layout Editor showing a layout

The Keyboard Layout Editor showing a layout

The screenshot above shows the main window of the program; a keyboard with blank layout (keys are empty), a section Add to layout with items that can be used to populate the layout, and a section for the description of the layout (Layout details).

There are typically two workflows; first you start off with a blank layout and you add Unicode characters, dead keys, include files, then you save.

The other workflow is to start with an appropriate existing layout as a base, then add more characters, make changes, etc.

It might be strange to talk about different workflows, but in terms of usability it’s important provide assistance for such cases. For example, having tooltips is important when a person starts off with a new layout.

Using the Keyboard Layout Editor

Using the Keyboard Layout Editor

Here we started with a blank layout; we click on Start Character Map, then locate the characters you need, and drag and drop them to the appropriate keys. Each key is composed of four parts, and we number these from 1 to 4. The way we count is quite peculiar,

  1. bottom left, when you press the key as is (key)
  2. top left, when you press the key with Shift (Shift + key)
  3. bottom right, when you press the key with AltGr (AltGr + key)
  4. top right, when you press the key with Shilft+AltGr (Shift + AltGr + key)
Analysis of a key

Analysis of a key

This is my entry to the most engineered diagram competition.

The dead keys relate to diacritic marks such as grave and acute. Since they are too small to see, we present them next to a D letter (D for Dead key). In some cases I could not find a character equivalent to the diacritic mark, so I put ?, therefore it looks like D?. If you put the mouse pointer over the key, you can see the full details in the tooltip.

Including files

Including files

In many cases, there exist layouts/variants that contain most of the characters you want to add. In this case, you add and enable in the Include files section. You can then override any of those characters by dragging and dropping to the layout.

At this stage in the blog post, it is important to clarify the notions of a layout and a variant. The two are quite similar and the distinction is messy when trying to explain to the end-user. The French layout file is fr, which contains several variants (distinct groups of mappings of physical keys to Unicode characters). When you are actually talking about a French keyboard layout, you are actually referring to the default variant of the «fr» file. Oftentimes people refer to the «fr» file as a whole as the French layout. You can also pick a non-default variant of the layout file, and call it your layout.

The way I would like to define layout and variant is this: a layout refers to the default variant of the layout file. This is consistent to the fact that distributions pick the default variant in the settings so it’s what get the most visibility, or when users select a new layout, they are presented with the default setting first. Regarding layouts in general, it is important for different languages/scripts to make effort that the default layout is updated and includes extra useful and relevant characters.

The new Greek keyboard layout

The new Greek keyboard layout

This is the updated Greek keyboard layout, and is the near-final version that we are planning to submit to xkeyboard-config. It adds Greek Polytonic to the existing Greek layout.  It does not make changes to the previous default layout, so users will not be unpleasantly surprised. It also adds all sort of characters that are found in the Greek Unicode block.

In this post I simplified some of the terms/description. If I went a bit too far, please correct me and I’ll update in-place.

Update 8th Sep 08: What are the plans for further development of the layout editor;

  • Increase the user base and get more people trying out the editor. This requires some more cleanup of the code, more instructions on how to run it youselves, and get people to provide feedback. An open-source project without users is not a successful project.
  • Make it easier for developers to contribute on the project. If you use Eclipse, you can install pydev, antlr3ide, mylyn, subclipse, and you can do the full development from within the cozy Eclipse environment. These need documentation.
  • The Issues page at the project has about ten items. This list needs to be reduced.
  • The natural place for users of the layout editor is the http://listserv.bat.ru/xkb/List.html mailing list. We need to promote the editor there, and get examples of users actually using it to maintain layouts.
  • An issue that plagues some users is when they need compose sequences to generate characters that no pre-composed forms exist. If users really need this (mainly Latin and Cyrillic scripts, complex scripts), it can be adapted to the UI.
  • It is technically easy to adapt the editor so that it produces XML layouts. Considering the state of XKB-atkins, this may not be a top priority at the moment. libxml2 comes with the MIT license, so in license terms it would be OK. Not sure if it is OK to link libxml2 to the X.org server. It might actually solve the slow parsing of the configurations files and the issues of xkbcomp.
  • At the moment the default geometry is a somewhat generic keyboard. In addition, I deactivated several keys (such as the function keys), in order not to confuse users (you can activate with a small change in the code). The keyboard can be expanded to a full 105-keys style. A related project would be to figure out an efficient way to edit those geometry files, and make the keyboard customised. If people start creating layouts with the editor, they will certainly love to edit geometry files!

Layout editor keyboard

This is a screenshot of the keyboard for the layout editor. The keyboard is a widget which is composed of individual widgets of each key.

I did not use glade-3 for the keyboard at this time. Although it is possible to create custom widgets in Python and install them in Glade, the current distributed packages are missing something, thus it would be messy when others try to use the editor. It’s a good experience to do all by hand anyway.

When creating a layout, you drag and drop characters on the keyboard. The editor shows a table with characters though it would be possible to drag characters from gucharmap as well.

The next step is to get an intuitive UI so that when you drop a character on a key, the key expands (a popup appears) showing the available four positions to receive the character.

Keyboard layout editor UI concept

(click for bigger image)

At the top we select the keyboard layout file, the variant, and set the corresponding verbose name.

The keyboard layout editor shows a standard keyboard, where each keyboard key can show up to four levels. When you select a key, the bottor-left window shows the characters that have been set (here we use four levels). In this bottom-left window we can drag and drop characters (from Unicode blocks) and dead keys that are found from the right of the image. Dead keys are shown in red boxes.

The user is also able to include existing keyboard layout files in the current layout.

At this stage I am thinking how to easily draw the keyboard in a PyGTK application. It would be important not to draw it manually. It would be cool to have a GTK+ keyboard key widget, that you can specify the size, and the text that appears on it, then build a keyboard in Glade. Another option would be to have the basic keyboard as an SVG file (already exists), then draw over it with Cairo. I am inclined for the second option.

Converting between XKB and XML

I completed the stage that takes keyboard layout files from XKB (X.Org) and converts them to XML documents, based on a keyboard layout Relax NG schema. Then, these XML documents can also be converted back to keyboard layout files.

Here is an imaginary example of a keyboard layout file.

// Keyboard layout for the Zzurope country (code: zz).
// Yeah.

partial alphanumeric_keys alternate_group hidden
xkb_symbols "bare" {
   key <AE01> { [        1, exclam,      onesuperior,  exclamdown      ] };

partial alphanumeric_keys alternate_group
xkb_symbols "basic" {
   name[Group1] = "ZZurope";

   include "zz(bare)"

   key <AD04> { [        r, R,           ediaeresis,   Ediaeresis      ] };
   key <AC07> { [        j, J,           idiaeresis,   Idiaeresis      ] };
   key <AB02> { [        x, X,           oe,           OE              ] };
   key <AB04> { [        v, V,           registered,   registered      ] };

partial alphanumeric_keys alternate_group
xkb_symbols "extended" {
    include "zz(basic)"
    name[Group1] = "ZZurope Extended";
    key.type = "THREE_LEVEL"; // We use three levels.
    override key <AD01> {   type[Group1] = "SEPARATE_CAPS_AND_SHIFT_ALPHABETIC",
[ U1C9, U1C8], [  any,   U1C7 ]   }; // q
    override key <AD02> {   [ U1CC, U1CB, any,U1CA ],
    key <BKSP> {
        symbols[Group1]= [ BackSpace,   Terminate_Server ]
    key <BKSR> { virtualMods = AltGr, [ 1, 2 ] };
    modifier_map Control { Control_L };
    modifier_map Mod5   { <LVL3>, <MDSW> };
    key <BKST> { [1, 2,3, 4] };

When converted to an XML document, it looks like

<?xml version="1.0" encoding="UTF-8"?>
<layout layoutname="zz">
      <tokenkey override="False">
      <tokenname name="ZZurope"/>
      <tokenkey override="False">
      <tokenkey override="False">
      <tokenkey override="False">
      <tokenkey override="False">
      <tokenname name="ZZurope Extended"/>
      <tokenmodifiermap state="Control">
        <keycode value="Control_L"/>
      <tokenmodifiermap state="Mod5">
        <keycodex value="LVL3"/>
        <keycodex value="MDSW"/>
      <tokenkey override="True">
          <typegroup value="SEPARATE_CAPS_AND_SHIFT_ALPHABETIC"/>
      <tokenkey override="True">
          <typegroup value="SEPARATE_CAPS_AND_SHIFT_ALPHABETIC"/>
      <tokenkey override="False">
          <typegroup value="CTRL+ALT"/>
      <tokenkey override="False">
          <tokenvirtualmodifiers value="AltGr"/>
      <tokenkey override="False">

When we convert the XML document back to the XKB format, it looks like

hidden xkb_symbols "bare"
	key <AE01> { [ 1, exclam, onesuperior, exclamdown ] };

xkb_symbols "basic"
	name = "ZZurope";
	include "zz(bare)"
	key <AD04> { [ r, R, ediaeresis, Ediaeresis ] };
	key <AC07> { [ j, J, idiaeresis, Idiaeresis ] };
	key <AB02> { [ x, X, oe, OE ] };
	key <AB04> { [ v, V, registered, registered ] };

xkb_symbols "extended"
	name = "ZZurope Extended";
	include "zz(basic)"
	key.type = "THREE_LEVEL";
	modifier_map Control { Control_L };
	modifier_map Mod5 { <LVL3>, <MDSW> };
	override key <AD01> { [ U1C9, U1C8 ], [ any, U1C7 ], type = "SEPARATE_CAPS_AND_SHIFT_ALPHABETIC"  };
	override key <AD02> { [ U1CC, U1CB, any, U1CA ], type = "SEPARATE_CAPS_AND_SHIFT_ALPHABETIC"  };
	key <BKSP> { [ BackSpace, Terminate_Server ], type = "CTRL+ALT"  };
	key <BKSR> { [ 1, 2 ], virtualMods = AltGr  };
	key <BKST> { [ 1, 2, 3, 4 ] };

Some things are missing such as partial, alphanumeric_keys and alternate_group, which I discussed with Sergey and he said they should be ok to go away.

In addition, we simplify by keeping just Group1 (we do not specify it, as it is implied).

I performed the round-trip with all layout files, and all parsed and validated OK (there is some extra work with the level3 file remaining, though).

Some issues that are remaining, include

  • Figuring out how to use XLink to link to documents in the same folder (+providing a parameter; the name of the variant), and how to represent that in the Relax NG schema.
  • Sort the layout entries by keycode value.

ANTLR grammar for XKB, and Relax NG schema (draft)

I completed the ANTLRv3 grammar for symbols/ configuration files of XKB. The grammar can parse and create the abstract syntax tree (AST) for all keyboard layouts in xkeyboard-config.

ANTLRv3 helps you create parsers for domain specific languages (DSL), an example of which is the configuration files in XKB.

Having the ANTLRv3 grammar for a configuration file allows to generate code in any of the supported target lagnuages (C, C++, Java, Python, C#, etc), so that you easily include a parser that reads those files. Essentially you avoid using custom parsers which can be difficult to maintain, or parsers that were generated with flex/bison.

On a similar note, here is the grammar to parse Compose files (such as en_US.UTF-8/Compose.pre). I am not going to be using in the project for now, but it was fun writing it. The Python target takes 18s to create the AST for the >5500 lines of the en_US.UTF-8 compose file, on a typical modern laptop.

I am also working on creating a RelaxNG schema for the XKB configuration files (those under symbols/). There is a draft available, which needs much more work.The Relax NG book by Eric van de Vlist is very useful here.

The immediate goal is to use the code generated by ANTLR to parse the XKB files and create XML files based on the Relax NG schema. I am using Python, and there are a few options; the libxml2 bindings for Python, and PyXML. The latter has more visible documentation, but I think that I should better be using the former.

Update: lxml appears to be the nice way to use libxml2 (instead of using directly libxml2).

Looking into the symbol files

In the previous post, we talked about the ANTLR grammar that parses the XKB layout files.

The grammar is available at http://code.google.com/p/keyboardlayouteditor/source/browse. I’ll rather push to the freedesktop repository once the project is completed. Now it’s too easy for me, just doing svn commit -m something.

Below you can see the relevant layout files for each country (and in some cases, language), and how the grammar deals with them. First column is filenames from the CVS XKB symbols subdirectory (to be moved eminently to GIT). Last’s week discussion with Sergey helped me figure out issues with the symbol files, simplify what information is needed, and what can be eliminated. Second column has Not OK if something is wrong. Third column tries to explain what was wrong.

gb NOK Non-UTF8
group NOK virtualMods= AltGr
hu NOK Non-UTF8
il NOK key.type=”FOUR_LEVEL” (typically: key.type[something]=….)
in NOK key.type=”FOUR_LEVEL” (typically: key.type[something]=….)
jp NOK key <BKSP> {
type=””,   // empty?
symbols[Group1]= [ bracketright, braceright ]
keypad NOK overlay1=<KO7> }; // what’s “overlay”?
level3 NOK virtual_modifiers LAlt, AlGr; virtualMods= Lalt
nbsp NOK Non-UTF8
pc NOK key <AA00> { type=”SOMETHING” } instead of { type[Group1]=”SOMETHING” }
shift NOK actions [Group1] = [
srvr_ctrl NOK key <AA00> { type=”SOMETHING” } instead of { type[Group1]=”SOMETHING” }

Non-UTF-8 are the files that have characters that are not UTF-8 (are iso-8859-1).

Some layouts have key.type = “something” and others key.type[SomeGroup] = “something”. Apparently, the format allows to infer which is the group that the type acts upon? That’s weird. Would it be better to put the group information? Is it required that the group is not set?

Some files have virtualMods, which I do not know what it is. Is it used?

Parsing XKB files with antlr

antlr (well, antlr3) is an amazing tool that replaces lex/flex, yacc/bison.

One would use antlr3 if they want to deal with Domain-Specific Languages (DSL), an example of which are the text configuration files.

In our case, we use antlr3 to parse some of the XKB configuration files, those found in /etc/X11/xkb/symbols/??.

Our aim is to be able to easily read and write those configuration files. Of course, once we have them read, we do all sorts of processing.

The stable version of antlr3 is 3.0.1, which happened to give lots of internal errors. It has not been very useful, so I tried a few times the latest beta version 3.1b, and eventually managed to get it to work. If I am not mistaken, 3.1 stable should be announced in a few days.

When using antlr, you have the choice of several target languages, such as Java, C, C++ and Python. I am using the Python target, and the latest version that is available from the antlr3 repository.

Here is the tree of the gb layout file,

tree = (SECTION (MAPTYPE (MAPOPTIONS partial default alphanumeric_keys xkb_symbols) (MAPNAME “basic”)) (MAPMATERIAL (TOKEN_INCLUDE “latin”) (TOKEN_NAME Group1 (VALUE “United Kingdom”)) (TOKEN_KEY (KEYCODEX AE02) (KEYSYMS 2 quotedbl twosuperior oneeighth)) (TOKEN_KEY (KEYCODEX AE03) (KEYSYMS 3 sterling threesuperior sterling)) (TOKEN_KEY (KEYCODEX AE04) (KEYSYMS 4 dollar EuroSign onequarter)) (TOKEN_KEY (KEYCODEX AC11) (KEYSYMS apostrophe at dead_circumflex dead_caron)) (TOKEN_KEY (KEYCODEX TLDE) (KEYSYMS grave notsign bar bar)) (TOKEN_KEY (KEYCODEX BKSL) (KEYSYMS numbersign asciitilde dead_grave dead_breve)) (TOKEN_KEY (KEYCODEX LSGT) (KEYSYMS backslash bar bar brokenbar)) (TOKEN_INCLUDE “level3(ralt_switch_multikey)”))) (SECTION (MAPTYPE (MAPOPTIONS partial alphanumeric_keys xkb_symbols) (MAPNAME “intl”)) (MAPMATERIAL (TOKEN_INCLUDE “latin”) (TOKEN_NAME Group1 (VALUE “United Kingdom – International (with dead keys)”)) (TOKEN_KEY (KEYCODEX AE02) (KEYSYMS 2 dead_diaeresis twosuperior onehalf)) (TOKEN_KEY (KEYCODEX AE03) (KEYSYMS 3 sterling threesuperior onethird)) (TOKEN_KEY (KEYCODEX AE04) (KEYSYMS 4 dollar EuroSign onequarter)) (TOKEN_KEY (KEYCODEX AE06) (KEYSYMS 6 dead_circumflex NoSymbol onesixth)) (TOKEN_KEY (KEYCODEX AC11) (KEYSYMS dead_acute at apostrophe bar)) (TOKEN_KEY (KEYCODEX TLDE) (KEYSYMS dead_grave notsign bar bar)) (TOKEN_KEY (KEYCODEX BKSL) (KEYSYMS numbersign dead_tilde bar bar)) (TOKEN_KEY (KEYCODEX LSGT) (KEYSYMS backslash bar bar bar)) (TOKEN_INCLUDE “level3(ralt_switch)”))) (SECTION (MAPTYPE (MAPOPTIONS partial alphanumeric_keys xkb_symbols) (MAPNAME “dvorak”)) (MAPMATERIAL (TOKEN_INCLUDE “us(dvorak)”) (TOKEN_NAME Group1 (VALUE “United Kingdom – Dvorak”)) (TOKEN_KEY (KEYCODEX BKSL) (KEYSYMS numbersign asciitilde)) (TOKEN_KEY (KEYCODEX AE02) (KEYSYMS 2 quotedbl twosuperior NoSymbol)) (TOKEN_KEY (KEYCODEX AE03) (KEYSYMS 3 sterling threesuperior NoSymbol)) (TOKEN_KEY (KEYCODEX AE04) (KEYSYMS 4 dollar EuroSign NoSymbol)) (TOKEN_KEY (KEYCODEX LSGT) (KEYSYMS backslash bar)) (TOKEN_KEY (KEYCODEX AD01) (KEYSYMS apostrophe at)))) (SECTION (MAPTYPE (MAPOPTIONS partial alphanumeric_keys xkb_symbols) (MAPNAME “mac”)) (MAPMATERIAL (TOKEN_INCLUDE “latin”) (TOKEN_NAME Group1 (VALUE “United Kingdom – Macintosh”)) (TOKEN_KEY (KEYCODEX AE02) (KEYSYMS 2 at EuroSign)) (TOKEN_KEY (KEYCODEX AE03) (KEYSYMS 3 sterling numbersign)) (TOKEN_INCLUDE “level3(ralt_switch)”)))

When traversing the tree, we can then pretty-print the layout at wish:

partial default alphanumeric_keys xkb_symbols “basic” {
name[Group1] = “United Kingdom”;
include “latin”
include “level3(ralt_switch_multikey)”
key <AE02> = { [ 2 , quotedbl , twosuperior , oneeighth ] };
key <AE03> = { [ 3 , sterling , threesuperior , sterling ] };
key <AE04> = { [ 4 , dollar , EuroSign , onequarter ] };
key <AC11> = { [ apostrophe , at , dead_circumflex , dead_caron ] };
key <TLDE> = { [ grave , notsign , bar , bar ] };
key <BKSL> = { [ numbersign , asciitilde , dead_grave , dead_breve ] };
key <LSGT> = { [ backslash , bar , bar , brokenbar ] };
… snip …

The code is currently hosted at code.google.com (keyboardlayouteditor) and I intend to move it shortly to FDO.

Keyboard Layout Editor GSOC project

I got accepted for a GSOC project with the X.Org Foundation. My mentor is Sergey Udaltsov and I look forward working with him.

The project is about creating a Keyboard Layout Editor, that can be used to edit XKB files with a nice GUI.

I will be blogging about these from here (fdo category at this blog).

Testing the updated IM support in GTK+

In Improving input method support in GTK+-based apps, we talked about some work to update the list of compose sequences that GTK+ knows to the latest version that comes from Xorg. From 691 compose sequences, we now support over 5000.

The patch has landed in GTK+ (trunk), and here are instructions for testing.

  1. If you have not used jhbuild before, read the jhbuild instructions and install it.
  2. Add the following to your ~/.jhbuildrc file
    branches['gtk+'] = None    # Makes sure you build from the trunk of GTK+
  3. Install gtk+ using the command (see the comment of James on this post on how to avoid Step 5 below)
    jhbuild build gtk+
  4. About 40 minutes later, and about 700MB of space (~600MB for source, ~100MB for installation of files) consumed, you should get a working copy of GTK+ 2.12.
  5. You can use this compiled version of GTK+ by running
    jhbuild shell

    This should give you a new shell, and whatever you run from here will use our fresh GTK+. Try running “gedit”. You will notice that the theme is different; it uses the default theme due to the special GTK+. This shell has set special environment variables so that program that run will use the fresh GTK+. The rest of the libraries come from our distribution.

  6. If you try to type compose sequences, you will notice no improvement. This is because at the moment jhbuild builds the branch 2.12 of GTK+ and not trunk. We need to download GTK+ from trunk and rebuild.
    cd ~/checkout/gnome2/
    mv gtk+ gtk+-branch-2.12
    svn co svn://svn.gnome.org/svn/gtk+/trunk gtk+
    jhbuild build --no-network gtk+
  7. Perform Step 4 and get gedit running.

How to test?

  • Setup a keyboard layout that supports a good variety of dead keys. My preference is GBr (United Kingdom). Here, AltGr+[];’#/ and AltGr+{}:@~? produce different dead keys. You press one of these combinations and then you press a letter. If such a combination exists, then it gets printed. For example, the old GTK+ produces öõóôòx åōőxxx. The new GTK+ produces öõóôòọ åōőǒŏȯ (12 dead keys).
  • Setup Greek, Polytonic (Ancient Greek). The dead keys are [];’ {}:@ AltGr+[] (10 dead keys). Produce characters such as ᾅᾂᾷῗὕὒᾥᾢῷ.
  • Try compose sequences as described from the upstream file at XOrg. For example,
    ComposeKey+(   1 0 )  produces ⑩. Try the same for 0-20, a-zA-Z.
  • Other miscellaneous, Ṩǟấẫǡ (using GBr layout)

The next step would be to parse the list of compose sequences and produce a documentation file.

Keyboard layout for combining diacritics

Typically, if you want to type characters with accents, such as á, ë, ś, you need to configure a suitable keyboard layout that includes compose sequences for those characters. The produced characters are what we call as precomposed characters; which were included in the early stages of Unicode. Nowdays, the idea is that you do not need to define á as a distinct character because it can be represented as a and ´, where the latter is a combining diacritic.

When put together a character and a combining diacritic, they fuse together, producing a seemingly single character. á is a precomposed (really one character), while á is letter a and the combining diacritic called acute (two characters). You can type the latter á by

  1. Type a
  2. Press Ctrl+Shift+u, then type 301, then press space bar.

Western languages do not really require combining marks, so the existing keyboard layouts do not use them. Other scripts, such as the Congolese keyboard layout (based on Latin) make good use of them.

Gedit, pango and combining diacritics

This is gedit showing off pango and DejaVu fonts (default font in major distributions).

Line 3 is a bit of an extreme, showing a sandwich of combining diacritics.

Line 4 shows the base character a with the combining diacritics from the Unicode range 0x300 to 0x315.

Both lines 3 and 4 were produced easily with a modified keyboard layout, which is show below.

Line 5 is just me being silly. You can have combining diacritics that enclose your base character.

$ cat  /usr/share/X11/xkb/symbols/combining
partial alphanumeric_keys alternate_group
xkb_symbols "combining" {

    name[Group1] = "Combining diacritics";

    key.type[Group1] = "FOUR_LEVEL";

    key <AD11> { [ NoSymbol, NoSymbol, 0x1000300, 0x1000301 ] }; // à   á
    key <AD12> { [ NoSymbol, NoSymbol, 0x1000302, 0x1000303 ] }; // â   ã

    key <AC10> { [ NoSymbol, NoSymbol, 0x1000304, 0x1000305 ] }; // ā   a̅
    key <AC11> { [ NoSymbol, NoSymbol, 0x1000306, 0x1000307 ] }; // ă   ȧ
    key <BKSL> { [ NoSymbol, NoSymbol, 0x1000308, 0x1000309 ] }; // ä    ả

    key <AB08> { [ NoSymbol, NoSymbol, 0x1000310, 0x1000311 ] }; // a̐     ȃ
    key <AB09> { [ NoSymbol, NoSymbol, 0x1000312, 0x1000313 ] }; // a̒     a̓
    key <AB10> { [ NoSymbol, NoSymbol, 0x1000314, 0x1000315 ] }; // a̔     a̕
$ diff -u /usr/share/X11/xkb/symbols/us.ORIGINAL /usr/share/X11/xkb/symbols/us
--- /usr/share/X11/xkb/symbols/us.ORIGINAL      2008-02-20 11:11:13.000000000 +0000
+++ /usr/share/X11/xkb/symbols/us       2008-02-20 13:02:07.000000000 +0000
@@ -492,3 +492,12 @@
     name[Group1]= "U.S. English - Macintosh";

+partial alphanumeric_keys modifier_keys
+xkb_symbols "combining_us" {
+    include "us"
+    include "combining"
+    key.type[Group1] = "FOUR_LEVEL";
+    name[Group1] = "U.S. English - Combining";
$ diff -u /usr/share/X11/xkb/rules/xorg.xml.ORIGINAL /usr/share/X11/xkb/rules/xorg.xml
--- /usr/share/X11/xkb/rules/xorg.xml.ORIGINAL  2008-02-20 11:27:00.000000000 +0000
+++ /usr/share/X11/xkb/rules/xorg.xml   2008-02-20 11:27:48.000000000 +0000
@@ -3643,6 +3643,12 @@
             <description xml:lang="zh_TW">Macintosh</description>
+        <variant>
+          <configItem>
+            <name>combining_us</name>
+            <description>Combining</description>
+          </configItem>
+        </variant>
$ _

Then, you select this keyboard layout (U.S. English) and variant (Combining) in the Keyboard Indicator applet.

Unlike dead keys, with combining diacritics you first type the base character (such as a) and then any combining diacritics.
Our sample layout variant puts the diacritics in the physical keys for [];’#,./. For example,

  • a + AltGr+[ : à
  • a + AltGr+Shift+[ : á
  • a + AltGr+[ + AltGr+’ : ằ

If your language has needs that can be solved with combining diacritics, this is how they are solved.

It is quite important to create keyboard layouts for all languages, and actually make good use of them.