Converting between XKB and XML

I completed the stage that takes keyboard layout files from XKB (X.Org) and converts them to XML documents, based on a keyboard layout Relax NG schema. Then, these XML documents can also be converted back to keyboard layout files.

Here is an imaginary example of a keyboard layout file.

// Keyboard layout for the Zzurope country (code: zz).
// Yeah.

partial alphanumeric_keys alternate_group hidden
xkb_symbols "bare" {
   key <AE01> { [        1, exclam,      onesuperior,  exclamdown      ] };
};

partial alphanumeric_keys alternate_group
xkb_symbols "basic" {
   name[Group1] = "ZZurope";

   include "zz(bare)"

   key <AD04> { [        r, R,           ediaeresis,   Ediaeresis      ] };
   key <AC07> { [        j, J,           idiaeresis,   Idiaeresis      ] };
   key <AB02> { [        x, X,           oe,           OE              ] };
   key <AB04> { [        v, V,           registered,   registered      ] };
};

partial alphanumeric_keys alternate_group
xkb_symbols "extended" {
    include "zz(basic)"
    name[Group1] = "ZZurope Extended";
    key.type = "THREE_LEVEL"; // We use three levels.
    override key <AD01> {   type[Group1] = "SEPARATE_CAPS_AND_SHIFT_ALPHABETIC",
[ U1C9, U1C8], [  any,   U1C7 ]   }; // q
    override key <AD02> {   [ U1CC, U1CB, any,U1CA ],
type[Group1] = "SEPARATE_CAPS_AND_SHIFT_ALPHABETIC" }; // w
    key <BKSP> {
        type[Group1]="CTRL+ALT",
        symbols[Group1]= [ BackSpace,   Terminate_Server ]
    };
    key <BKSR> { virtualMods = AltGr, [ 1, 2 ] };
    modifier_map Control { Control_L };
    modifier_map Mod5   { <LVL3>, <MDSW> };
    key <BKST> { [1, 2,3, 4] };
};

When converted to an XML document, it looks like

<?xml version="1.0" encoding="UTF-8"?>
<layout layoutname="zz">
  <symbols>
    <mapoption>hidden</mapoption>
    <mapoption>xkb_symbols</mapoption>
    <mapname>bare</mapname>
    <mapmaterial>
      <tokenkey override="False">
        <keycodename>AE01</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>1</symbol>
            <symbol>exclam</symbol>
            <symbol>onesuperior</symbol>
            <symbol>exclamdown</symbol>
          </symbolsgroup>
        </keysymgroup>
      </tokenkey>
    </mapmaterial>
  </symbols>
  <symbols>
    <mapoption>xkb_symbols</mapoption>
    <mapname>basic</mapname>
    <mapmaterial>
      <tokenname name="ZZurope"/>
      <tokeninclude>zz(bare)</tokeninclude>
      <tokenkey override="False">
        <keycodename>AD04</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>r</symbol>
            <symbol>R</symbol>
            <symbol>ediaeresis</symbol>
            <symbol>Ediaeresis</symbol>
          </symbolsgroup>
        </keysymgroup>
      </tokenkey>
      <tokenkey override="False">
        <keycodename>AC07</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>j</symbol>
            <symbol>J</symbol>
            <symbol>idiaeresis</symbol>
            <symbol>Idiaeresis</symbol>
          </symbolsgroup>
        </keysymgroup>
      </tokenkey>
      <tokenkey override="False">
        <keycodename>AB02</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>x</symbol>
            <symbol>X</symbol>
            <symbol>oe</symbol>
            <symbol>OE</symbol>
          </symbolsgroup>
        </keysymgroup>
      </tokenkey>
      <tokenkey override="False">
        <keycodename>AB04</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>v</symbol>
            <symbol>V</symbol>
            <symbol>registered</symbol>
            <symbol>registered</symbol>
          </symbolsgroup>
        </keysymgroup>
      </tokenkey>
    </mapmaterial>
  </symbols>
  <symbols>
    <mapoption>xkb_symbols</mapoption>
    <mapname>extended</mapname>
    <mapmaterial>
      <tokenname name="ZZurope Extended"/>
      <tokeninclude>zz(basic)</tokeninclude>
      <tokentype>THREE_LEVEL</tokentype>
      <tokenmodifiermap state="Control">
        <keycode value="Control_L"/>
      </tokenmodifiermap>
      <tokenmodifiermap state="Mod5">
        <keycodex value="LVL3"/>
        <keycodex value="MDSW"/>
      </tokenmodifiermap>
      <tokenkey override="True">
        <keycodename>AD01</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>U1C9</symbol>
            <symbol>U1C8</symbol>
          </symbolsgroup>
          <symbolsgroup>
            <symbol>any</symbol>
            <symbol>U1C7</symbol>
          </symbolsgroup>
          <typegroup value="SEPARATE_CAPS_AND_SHIFT_ALPHABETIC"/>
        </keysymgroup>
      </tokenkey>
      <tokenkey override="True">
        <keycodename>AD02</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>U1CC</symbol>
            <symbol>U1CB</symbol>
            <symbol>any</symbol>
            <symbol>U1CA</symbol>
          </symbolsgroup>
          <typegroup value="SEPARATE_CAPS_AND_SHIFT_ALPHABETIC"/>
        </keysymgroup>
      </tokenkey>
      <tokenkey override="False">
        <keycodename>BKSP</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>BackSpace</symbol>
            <symbol>Terminate_Server</symbol>
          </symbolsgroup>
          <typegroup value="CTRL+ALT"/>
        </keysymgroup>
      </tokenkey>
      <tokenkey override="False">
        <keycodename>BKSR</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>1</symbol>
            <symbol>2</symbol>
          </symbolsgroup>
          <tokenvirtualmodifiers value="AltGr"/>
        </keysymgroup>
      </tokenkey>
      <tokenkey override="False">
        <keycodename>BKST</keycodename>
        <keysymgroup>
          <symbolsgroup>
            <symbol>1</symbol>
            <symbol>2</symbol>
            <symbol>3</symbol>
            <symbol>4</symbol>
          </symbolsgroup>
        </keysymgroup>
      </tokenkey>
    </mapmaterial>
  </symbols>
</layout>

When we convert the XML document back to the XKB format, it looks like

hidden xkb_symbols "bare"
{
	key <AE01> { [ 1, exclam, onesuperior, exclamdown ] };
};

xkb_symbols "basic"
{
	name = "ZZurope";
	include "zz(bare)"
	key <AD04> { [ r, R, ediaeresis, Ediaeresis ] };
	key <AC07> { [ j, J, idiaeresis, Idiaeresis ] };
	key <AB02> { [ x, X, oe, OE ] };
	key <AB04> { [ v, V, registered, registered ] };
};

xkb_symbols "extended"
{
	name = "ZZurope Extended";
	include "zz(basic)"
	key.type = "THREE_LEVEL";
	modifier_map Control { Control_L };
	modifier_map Mod5 { <LVL3>, <MDSW> };
	override key <AD01> { [ U1C9, U1C8 ], [ any, U1C7 ], type = "SEPARATE_CAPS_AND_SHIFT_ALPHABETIC"  };
	override key <AD02> { [ U1CC, U1CB, any, U1CA ], type = "SEPARATE_CAPS_AND_SHIFT_ALPHABETIC"  };
	key <BKSP> { [ BackSpace, Terminate_Server ], type = "CTRL+ALT"  };
	key <BKSR> { [ 1, 2 ], virtualMods = AltGr  };
	key <BKST> { [ 1, 2, 3, 4 ] };
};

Some things are missing such as partial, alphanumeric_keys and alternate_group, which I discussed with Sergey and he said they should be ok to go away.

In addition, we simplify by keeping just Group1 (we do not specify it, as it is implied).

I performed the round-trip with all layout files, and all parsed and validated OK (there is some extra work with the level3 file remaining, though).

Some issues that are remaining, include

  • Figuring out how to use XLink to link to documents in the same folder (+providing a parameter; the name of the variant), and how to represent that in the Relax NG schema.
  • Sort the layout entries by keycode value.

Permanent link to this article: https://blog.simos.info/converting-between-xkb-and-xml/

4 comments

Skip to comment form

    • Anonymous on June 21, 2008 at 07:08
    • Reply

    That is a wonderful testament to how shit XML is.

    • Nicolas Mailhot on June 21, 2008 at 08:26
    • Reply

    @Anonymous: you’ve obviously never tried to edit files in the old compact syntax, which is so brittle and error-inducing you always spend at least 30min figuring where a typo introduced breakage (that is if you do not give up as the average human does). It’s a pretty minefield.

    @Simos

    Nice to see this kind of progress!

    Now, I think you still need to work a lot on your xml grammar. You’re falling in the trap of every machine-generated file (XML or not) which is excessive verbosity. To be successful and get adoption you need to work harder at having concise and readable files.

    Some remarks :
    — you have a symbols element with mapoption, mapname, mapmaterial… inside. That sort of screams your symbol element should be named map (same for tokentype… inside mapmaterial. Take care to have consistent naming please)

    — XML is a structured language. You should not need to name the child of foo foooption. You can infer an option is a foooption by the fact its parent is foo (for example, mapoption inside symbols, keycodename inside tokenkey)

    – as a rule, when you can have only one bar children of foo, it’s more compact and human-friendly to have it as attribute () that as children element. Though opinion on attributes vary in the XML world and some people recommend to just support both and have users choose the most appropriate to them. But anyway smart use of attributes should kill some of your XML complexity

    – a nice property of your XML layout is that each symbol is its own element. That means that unlike the legacy syntax, you can allow several symbol syntaxes. For example :
    й
    0439
    Cyrillic_shorti
    01000439
    This alone would make the files editable by normal beings

    — you don’t really need to use this syntax SEPARATE_CAPS_AND_SHIFT_ALPHABETIC when the start and end of the name is nicely delimited with “”

    – zz(basic) is really a two level element. Do you really want to keep another kind of tokenization inside your XML syntax?

    — it would probably simplify your files to have an override value at the mapmaterial level and only specify it in tokenkey when it’s different from the global one

    — you should ask for syntax advice on the xml-dev ML if you’ve not already done so

    • Nicolas Mailhot on June 21, 2008 at 08:31
    • Reply

    I meant
    – a nice property of your XML layout is that each symbol is its own element. That means that unlike the legacy syntax, you can allow several symbol syntaxes. For example :
    [unicodevalue]й[unicodevalue]
    [unicodepoint]0439[unicodepoint]
    [magicnamenooneknows]Cyrillic_shorti[magicnamenooneknows]
    [magicnonstandardvalue]01000439[magicnonstandardvalue]
    This alone would make the files editable by normal beings

  1. Many thanks for the comments Nicolas!
    I am looking into these.

Leave a Reply to Nicolas MailhotCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.