ANTLR grammar for XKB, and Relax NG schema (draft)

I completed the ANTLRv3 grammar for symbols/ configuration files of XKB. The grammar can parse and create the abstract syntax tree (AST) for all keyboard layouts in xkeyboard-config.

ANTLRv3 helps you create parsers for domain specific languages (DSL), an example of which is the configuration files in XKB.

Having the ANTLRv3 grammar for a configuration file allows to generate code in any of the supported target lagnuages (C, C++, Java, Python, C#, etc), so that you easily include a parser that reads those files. Essentially you avoid using custom parsers which can be difficult to maintain, or parsers that were generated with flex/bison.

On a similar note, here is the grammar to parse Compose files (such as en_US.UTF-8/Compose.pre). I am not going to be using in the project for now, but it was fun writing it. The Python target takes 18s to create the AST for the >5500 lines of the en_US.UTF-8 compose file, on a typical modern laptop.

I am also working on creating a RelaxNG schema for the XKB configuration files (those under symbols/). There is a draft available, which needs much more work.The Relax NG book by Eric van de Vlist is very useful here.

The immediate goal is to use the code generated by ANTLR to parse the XKB files and create XML files based on the Relax NG schema. I am using Python, and there are a few options; the libxml2 bindings for Python, and PyXML. The latter has more visible documentation, but I think that I should better be using the former.

Update: lxml appears to be the nice way to use libxml2 (instead of using directly libxml2).

Permanent link to this article: https://blog.simos.info/antlr-grammar-for-xkb-and-relax-ng-schema-draft/

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.