Tag : language

Μαθήματα γλώσσας προγραμματισμού Python στα ελληνικά

Στο φόρουμ του Ubuntu-gr έχουν ξεκινήσει μαθήματα για τη γλώσσα προγραμματισμού Python.

Το πρώτο μάθημα είναι για τη ρύθμιση του περιβάλλοντος Python στο σύστημά σας, οπότε μπορείτε εύκολα να λάβετε μέρος και να παρακολουθήσετε τα επόμενα βήματα.

Μέσω του φόρουμ του Ubuntu-gr γίνεται και η μετάφραση στα ελληνικά ενός οδηγού εκμάθησης της γλώσσας Python, που θα χρησιμοποιηθεί ως ύλη για τα μαθήματα. Η μετάφραση έχει σχεδόν ολοκληρωθεί.

Δείτε τα μαθήματα Python του φόρουμ Ubuntu-gr.

Ο Κώστας Τσακάλογλου αναφέρθηκε στα μαθήματα αυτά πριν λίγες μέρες.

Workaround for bad fonts in Google Earth 5 (Linux)

Update Jan 2010: The following may not work anymore. Use with caution. See relevant discussions at http://forum.ubuntu-gr.org/viewtopic.php?f=5&t=15607 and especially http://kigka.blogspot.com/2010/11/google-6.html

Older post follows:

So you just installed Google Earth 5 and you can’t figure out what’s wrong with the fonts? If your language does not use the Latin script, you cannot see any text?

Here is the workaround. The basic info comes from this google earth forum post and the reply that suggests to mess with the QT libraries.

Google Earth 5 is based on the Qt library, and Google is using their own copies of the Qt libraries. This means that the customisation (including fonts) that you do with qtconfig-qt4 does not affect Google Earth. Here we use Ubuntu 8.10, and we simply installed the Qt libraries in order to use some Qt programs. You probably do not have qtconfig-qt4 installed, so you need to get it.

So, by following the advice in the post above and replacing key Qt libraries from Google Earth with the ones provided by our distro, solves (read: workaround) the problem. Here comes the science:

If you have a 32-bit version of Ubuntu,

cd /opt/google-earth/
sudo mv libQtCore.so.4 libQtCore.so.4.bak
sudo mv libQtGui.so.4 libQtGui.so.4.bak
sudo mv libQtNetwork.so.4 libQtNetwork.so.4.bak
sudo mv libQtWebKit.so.4 libQtWebKit.so.4.bak
sudo ln -s /usr/lib/libQtCore.so.4.4.3  libQtCore.so.4
sudo ln -s /usr/lib/libQtGui.so.4.4.3  libQtGui.so.4
sudo ln -s /usr/lib/libQtNetwork.so.4.4.3  libQtNetwork.so.4
sudo ln -s /usr/lib/libQtWebKit.so.4.4.3  libQtWebKit.so.4

If you have the 64-bit version of Ubuntu, try

cd /opt/google-earth/

sudo getlibs googleearth-bin
sudo mv libQtCore.so.4 libQtCore.so.4.bak
sudo mv libQtGui.so.4 libQtGui.so.4.bak
sudo mv libQtNetwork.so.4 libQtNetwork.so.4.bak
sudo mv libQtWebKit.so.4 libQtWebKit.so.4.bak
sudo ln -s /usr/lib32/libQtCore.so.4.4.3  libQtCore.so.4
sudo ln -s /usr/lib32/libQtGui.so.4.4.3  libQtGui.so.4
sudo ln -s /usr/lib32/libQtNetwork.so.4.4.3  libQtNetwork.so.4
sudo ln -s /usr/lib32/libQtWebKit.so.4.4.3  libQtWebKit.so.4

Requires to have getlibs installed, and when prompted, install the 32-bit versions of the packages as instructed.

Now, with qtconfig-qt you can configure the font settings.

Playing with Git

Git is a version control system (VCS) software that is used for source code management (SCM). There are several examples of VCS software, such as CVS and SVN. What makes Git different is that it is a distributed VCS, that is, a DVCS.

Being a DVCS, when you use Git you create fully capable local repositories that can be used for offline work. When you get the files of a repository, you actually grab the full information (this makes the initial creation of local repositories out of a remote repository slower, and the repositories are bigger).

You can install git by installing the git package. You can test it by opening a terminal window, and running

git clone git://github.com/schacon/whygitisbetter.git

The files appear in a directory called whygitisbetter. In a subdirectory called .git/,git stores all the controlling information it requires to manage the local repository. When you enter the repository directory (whygitisbetter in our case), you can issue commands that will figure out what’s going on because of the info in .git/.

With git, we create local copies of repositories by cloning. If you have used CVS or SVN, this is somewhat equivalent to the checkout command. By cloning, you create a full local repository. When you checkout with CVS or SVN, you get the latest snapshot only of the source code.

What you downloaded above is the source code for the http://www.whygitisbetterthanx.com/ website. It describes the relative advantages of git compared to other VCS and DVCS systems.

Among the different sources of documentation for git, I think one of the easiest to read is the Git Community Book. It is consise and easy to follow, and it comes with video casting (videos that show different tasks, with audio guidance).

You can create local repositories on your system. If you want to have a remote repository, you can create an account at GitHub, an attractive start-up that offers 100MB free space for your git repository. Therefore, you can host your pet project on github quite easily.

GitHub combines source code management with social networking, no matter how strange that may look like. It comes with tools that allows to maintain your own copies of repositories (for example, from other github users), and helps with the communication. For example, if I create my own copy of the whygitisbetter repository and add something nice to the book, I can send a pull request (with the click of a button) to the maintainer to grab my changes!

If you have already used another SCM tool (non-distributed), it takes some time to get used to the new way of git. It is a good skill to have, and the effort should pay off quickly. There is a SVN to Git crash course available.

If you have never used an SCM, it is cool to go for git. There is nothing to unlearn, and you will get a new skill.

Git is used for the developement of the Linux kernel, the Perl language, Ruby On Rails, and others.

Firefox 3 statistics, and the Greek language

Firefox 3 was released on the 17th June, 2008 and up to now, an impressive 22 million copies have been downloaded.

kkovash had a peek at the stats and produced a nice post with diagram for the downloads of the localised versions of Firefox 3 (that is, excluding en-US).

Firefox 3 Downloads; part of EMEA region, focus on Greece

Downloads at [Release+3] days (20th June 2008)

Dark red signifies that there have been more than 100,000 downloads originating from the respective country. It is quite visible that most European countries managed to surpass the 100,000 threshold. Greece at that point was hovering to about 50,000 downloads. In the Balkan region, Turkey was the first country to grab the red badge.

It is interesting to see that Iran has been No 2 in the whole of Asia (No 1 has been Japan). Only now China managed to reach the second place, and pushed Iran in the third place. When taking into account the population gap and the political situation, Iran achieved a amazing feat.

In the first few days, a few countries only managed to jump fast over the 100K mark. It appears that these countries have strong social network communities, that urged friends to grab a copy of Firefox 3.

Firefox 3 downloads, showing Greece, with Red status

This is a recent screenshow (26th June 2008), at [Release+9] days. Greece has achieved Red status the other day. In the Balkan region, Turkey, Romania and Bulgaria had reached 100,000 first.

In the EU region, it is notable that Ireland, at 76,000 downloads, is lagging behind.

Another observation is that the countries from Africa are lagging significantly from the rest of the world. Low broadband Internet penetration and limited number of Internet users is likely to be the reason.

How many downloads have there been for the Greek localisation of Firefox 3;

kkovash reveals that there have been about 60,000 downloads for the Greek localisation of Firefox 3. This would approximately mean that more than 60% of the downloads in Greece have been for the localised version. Great news.

ANTLR grammar for XKB, and Relax NG schema (draft)

I completed the ANTLRv3 grammar for symbols/ configuration files of XKB. The grammar can parse and create the abstract syntax tree (AST) for all keyboard layouts in xkeyboard-config.

ANTLRv3 helps you create parsers for domain specific languages (DSL), an example of which is the configuration files in XKB.

Having the ANTLRv3 grammar for a configuration file allows to generate code in any of the supported target lagnuages (C, C++, Java, Python, C#, etc), so that you easily include a parser that reads those files. Essentially you avoid using custom parsers which can be difficult to maintain, or parsers that were generated with flex/bison.

On a similar note, here is the grammar to parse Compose files (such as en_US.UTF-8/Compose.pre). I am not going to be using in the project for now, but it was fun writing it. The Python target takes 18s to create the AST for the >5500 lines of the en_US.UTF-8 compose file, on a typical modern laptop.

I am also working on creating a RelaxNG schema for the XKB configuration files (those under symbols/). There is a draft available, which needs much more work.The Relax NG book by Eric van de Vlist is very useful here.

The immediate goal is to use the code generated by ANTLR to parse the XKB files and create XML files based on the Relax NG schema. I am using Python, and there are a few options; the libxml2 bindings for Python, and PyXML. The latter has more visible documentation, but I think that I should better be using the former.

Update: lxml appears to be the nice way to use libxml2 (instead of using directly libxml2).

Looking into the symbol files

In the previous post, we talked about the ANTLR grammar that parses the XKB layout files.

The grammar is available at http://code.google.com/p/keyboardlayouteditor/source/browse. I’ll rather push to the freedesktop repository once the project is completed. Now it’s too easy for me, just doing svn commit -m something.

Below you can see the relevant layout files for each country (and in some cases, language), and how the grammar deals with them. First column is filenames from the CVS XKB symbols subdirectory (to be moved eminently to GIT). Last’s week discussion with Sergey helped me figure out issues with the symbol files, simplify what information is needed, and what can be eliminated. Second column has Not OK if something is wrong. Third column tries to explain what was wrong.

gb NOK Non-UTF8
group NOK virtualMods= AltGr
hu NOK Non-UTF8
il NOK key.type=”FOUR_LEVEL” (typically: key.type[something]=….)
in NOK key.type=”FOUR_LEVEL” (typically: key.type[something]=….)
jp NOK key <BKSP> {
type=””,   // empty?
symbols[Group1]= [ bracketright, braceright ]
keypad NOK overlay1=<KO7> }; // what’s “overlay”?
level3 NOK virtual_modifiers LAlt, AlGr; virtualMods= Lalt
nbsp NOK Non-UTF8
pc NOK key <AA00> { type=”SOMETHING” } instead of { type[Group1]=”SOMETHING” }
shift NOK actions [Group1] = [
srvr_ctrl NOK key <AA00> { type=”SOMETHING” } instead of { type[Group1]=”SOMETHING” }

Non-UTF-8 are the files that have characters that are not UTF-8 (are iso-8859-1).

Some layouts have key.type = “something” and others key.type[SomeGroup] = “something”. Apparently, the format allows to infer which is the group that the type acts upon? That’s weird. Would it be better to put the group information? Is it required that the group is not set?

Some files have virtualMods, which I do not know what it is. Is it used?

Parsing XKB files with antlr

antlr (well, antlr3) is an amazing tool that replaces lex/flex, yacc/bison.

One would use antlr3 if they want to deal with Domain-Specific Languages (DSL), an example of which are the text configuration files.

In our case, we use antlr3 to parse some of the XKB configuration files, those found in /etc/X11/xkb/symbols/??.

Our aim is to be able to easily read and write those configuration files. Of course, once we have them read, we do all sorts of processing.

The stable version of antlr3 is 3.0.1, which happened to give lots of internal errors. It has not been very useful, so I tried a few times the latest beta version 3.1b, and eventually managed to get it to work. If I am not mistaken, 3.1 stable should be announced in a few days.

When using antlr, you have the choice of several target languages, such as Java, C, C++ and Python. I am using the Python target, and the latest version that is available from the antlr3 repository.

Here is the tree of the gb layout file,

tree = (SECTION (MAPTYPE (MAPOPTIONS partial default alphanumeric_keys xkb_symbols) (MAPNAME “basic”)) (MAPMATERIAL (TOKEN_INCLUDE “latin”) (TOKEN_NAME Group1 (VALUE “United Kingdom”)) (TOKEN_KEY (KEYCODEX AE02) (KEYSYMS 2 quotedbl twosuperior oneeighth)) (TOKEN_KEY (KEYCODEX AE03) (KEYSYMS 3 sterling threesuperior sterling)) (TOKEN_KEY (KEYCODEX AE04) (KEYSYMS 4 dollar EuroSign onequarter)) (TOKEN_KEY (KEYCODEX AC11) (KEYSYMS apostrophe at dead_circumflex dead_caron)) (TOKEN_KEY (KEYCODEX TLDE) (KEYSYMS grave notsign bar bar)) (TOKEN_KEY (KEYCODEX BKSL) (KEYSYMS numbersign asciitilde dead_grave dead_breve)) (TOKEN_KEY (KEYCODEX LSGT) (KEYSYMS backslash bar bar brokenbar)) (TOKEN_INCLUDE “level3(ralt_switch_multikey)”))) (SECTION (MAPTYPE (MAPOPTIONS partial alphanumeric_keys xkb_symbols) (MAPNAME “intl”)) (MAPMATERIAL (TOKEN_INCLUDE “latin”) (TOKEN_NAME Group1 (VALUE “United Kingdom – International (with dead keys)”)) (TOKEN_KEY (KEYCODEX AE02) (KEYSYMS 2 dead_diaeresis twosuperior onehalf)) (TOKEN_KEY (KEYCODEX AE03) (KEYSYMS 3 sterling threesuperior onethird)) (TOKEN_KEY (KEYCODEX AE04) (KEYSYMS 4 dollar EuroSign onequarter)) (TOKEN_KEY (KEYCODEX AE06) (KEYSYMS 6 dead_circumflex NoSymbol onesixth)) (TOKEN_KEY (KEYCODEX AC11) (KEYSYMS dead_acute at apostrophe bar)) (TOKEN_KEY (KEYCODEX TLDE) (KEYSYMS dead_grave notsign bar bar)) (TOKEN_KEY (KEYCODEX BKSL) (KEYSYMS numbersign dead_tilde bar bar)) (TOKEN_KEY (KEYCODEX LSGT) (KEYSYMS backslash bar bar bar)) (TOKEN_INCLUDE “level3(ralt_switch)”))) (SECTION (MAPTYPE (MAPOPTIONS partial alphanumeric_keys xkb_symbols) (MAPNAME “dvorak”)) (MAPMATERIAL (TOKEN_INCLUDE “us(dvorak)”) (TOKEN_NAME Group1 (VALUE “United Kingdom – Dvorak”)) (TOKEN_KEY (KEYCODEX BKSL) (KEYSYMS numbersign asciitilde)) (TOKEN_KEY (KEYCODEX AE02) (KEYSYMS 2 quotedbl twosuperior NoSymbol)) (TOKEN_KEY (KEYCODEX AE03) (KEYSYMS 3 sterling threesuperior NoSymbol)) (TOKEN_KEY (KEYCODEX AE04) (KEYSYMS 4 dollar EuroSign NoSymbol)) (TOKEN_KEY (KEYCODEX LSGT) (KEYSYMS backslash bar)) (TOKEN_KEY (KEYCODEX AD01) (KEYSYMS apostrophe at)))) (SECTION (MAPTYPE (MAPOPTIONS partial alphanumeric_keys xkb_symbols) (MAPNAME “mac”)) (MAPMATERIAL (TOKEN_INCLUDE “latin”) (TOKEN_NAME Group1 (VALUE “United Kingdom – Macintosh”)) (TOKEN_KEY (KEYCODEX AE02) (KEYSYMS 2 at EuroSign)) (TOKEN_KEY (KEYCODEX AE03) (KEYSYMS 3 sterling numbersign)) (TOKEN_INCLUDE “level3(ralt_switch)”)))

When traversing the tree, we can then pretty-print the layout at wish:

partial default alphanumeric_keys xkb_symbols “basic” {
name[Group1] = “United Kingdom”;
include “latin”
include “level3(ralt_switch_multikey)”
key <AE02> = { [ 2 , quotedbl , twosuperior , oneeighth ] };
key <AE03> = { [ 3 , sterling , threesuperior , sterling ] };
key <AE04> = { [ 4 , dollar , EuroSign , onequarter ] };
key <AC11> = { [ apostrophe , at , dead_circumflex , dead_caron ] };
key <TLDE> = { [ grave , notsign , bar , bar ] };
key <BKSL> = { [ numbersign , asciitilde , dead_grave , dead_breve ] };
key <LSGT> = { [ backslash , bar , bar , brokenbar ] };
… snip …

The code is currently hosted at code.google.com (keyboardlayouteditor) and I intend to move it shortly to FDO.

Take Back The Tech #2!

Last year we talked about the Take Back The Tech, an initiative by the Association for Progressive Communications, Women’s Networking Support Programme (APC WNSP) to stop violence against women with the use of Information and Communication Technologies (ICT), that took place between the 25th November and the 10th December. The same initiative runs this year during the same days (25th November to 10th December). At the time of writing this the event is at Day 8 of the 16-day event.

Violence Against Women (VAW) can also be perpetrated through the use of ICT (such as being a victim of targeted spyware or malicious online intimidation). Therefore, a better use of ICT (Take Back The Tech!) would help mitigate online-related VAW and reclaim the control of technology.

You can start your own campaign and join the existing ones that are in place. In Europe there are existing campaigns in the UK and Skopje.

Here is the announcement for this year,

25 Nov to 10 Dec

ka-BLOG! Calling all bloggers to contaminate the blogosphere with
activism on VAW for 16 days.

ka-BLOG is a 16-day blog fest for the Take Back the Tech Campaign. It
is open to anyone and everyone – girls, boys, everyone beyond and more
— who want to share their thoughts on violence against women, and how
online communications can exacerbate or help eliminate VAW.

We welcome bloggers in different languages!

ka-BLOG with us 🙂

For more information, go http://www.takebackthetech.net, or email jac
AT apcwomen DOT org

[FYI. In Filipino slang, “ka-BLOG” would mean someone you blog with.]

Localisation issues in home directory folders (xdg-user-dirs)

In new distributions such as Ubuntu 7.10 there is now support for folder names of personal data in your local language. What this means is that ~/Desktop can now be called ~/Επιφάνεια εργασίας. You also get a few more default folders, including ~/Music, ~/Documents, ~/Pictures and so on.

This functionality of localised home folders has become available thanks to a new FreeDesktop standard, XDG-USER-DIRS. xdg-user-dirs can be localised, and the current localisations are available at xdg-user-dirs/po.

A potential issue arises when a user logs in with different locales; how does the system switch between the localised versions of the folder names? For GNOME there is a migration tool; as soon as you login into your account with a different locale, the system will prompt whether you wish to switch the names from one language to another. This is available through the xdg-user-dirs-gtk application.

Another issue is with users who use the command line quite often; switching between two languages (for those languages that use a script other than latin) tends to become cumbersome, especially if you have not setup your shell for intelligent completion. In addition, when you connect remotely using SSH, you may not be able to type in the local language at the initial computer which would make work very annoying.

Furthermore, there have been reports with KDE applications not working; if someone can bug report it and post the link it would be great. The impression I got was that some installations of KDE did not read off the filesystem in UTF-8 but in a legacy 8-bit encoding. This requires further investigation.

Moreover, OpenOffice.org requires some integration work to follow the xdg-user-dirs standard; apparently it has its own option as to which folder it will save into any newly created files. I believe this will be resolved in the near future.

Now, if we just installed Ubuntu 7.10 or Fedora 8, and we got, by default, localised subfolders in our home directory (which we may not prefer), what can we do to revert to non-localised folders?

The lazy way is to logout, choose an English locale as the default locale for the system and log in. You will be presented with the xdg-user-dirs-gtk migration tool (shown above) that will give you the option to switch to English folder names for those personal folders.

Clarification: It is implied for this workaround (logout and login thing), you then log out again, set the language to the localised one (i.e. Greek) and log in. This time, when the system asks to rename the personal folders, you simply answer no, and you end up with a localised desktop but personal folders in English. Mission really accomplished.

If you are of the tinkering type, the files to change manually are

$ cat ~/.config/user-dirs.locale




$ cat ~/.config/user-dirs.dirs

# This file is written by xdg-user-dirs-update
# If you want to change or add directories, just edit the line you’re
# interested in. All local changes will be retained on the next run
# Format is XDG_xxx_DIR=”$HOME/yyy”, where yyy is a shell-escaped
# homedir-relative path, or XDG_xxx_DIR=”/yyy”, where /yyy is an
# absolute path. No other format is supported.
XDG_DESKTOP_DIR=”$HOME/Επιφάνεια εργασίας”
XDG_DOWNLOAD_DIR=”$HOME/Επιφάνεια εργασίας”

Personally I believe that having localised names appear under the home folder is good for the majority of users, as they will be able to match what is shown in Locations with the actual names on the filesystem.

There will be cases that software has to be updated and bugs fixed (such as in backup tools). As we proceed with more advanced internationalisation/localisation support in Linux, it is desirable to follow forward, and fix problematic software.

However, if enough popular support arises with clear arguments (am referring to Greek-speaking users and a current discussion) for default folder names in the English languages, we could follow the popular demand.

Also see the relevant blog post New Dirs in Gutsy: Documents, Music, Pictures, Blah, Blah by Moving to Freedom.

Cannot write Greek Polytonic in Linux

For up to date instructions for Greek and Greek Polytonic see How to type Greek, Greek Polytonic in Linux.

The following text is kept for historical purposes. Greek and Greek Polytonic now works in Linux, using the default Greek layout.

General Update: If you have Ubuntu 8.10, Fedora 10 or a similarly new distribution, then Greek Polytonic works out-of-the-box. Simply select the Greek Polytonic layout. For more information, see the recent Greek Polytonic post.

Update 3rd May 2008: If you have Ubuntu 8.04 (probably applies to other recent Linux distributions as well), you simply need to add GTK_IM_MODULE=xim to /etc/environment. Start a Terminal (Applications/Accessories/Terminal) and type the commands (the first command makes a backup copy of the configuration file, and the second opens the configuration file with administrative priviliges, so that you can edit and save):

$ gksudo cp /etc/environment /etc/environment.ORIGINAL
$ gksudo gedit /etc/environment

then append


save, and restart your computer. It should work now. Try to test with the standard Text editor, found in Accessories.

In Ubuntu 8.10 (autumn 2008), it should work out of the box, just by enabling the Greek Polytonic layout.

Update 20th June 2008: If still some accents/breathings/aspirations do not work, then this is probably related to your system locale (whether it is Greek or not). It works better when it is Greek. If you are affected and you do not use the Greek locale, there is one more thing to do.

$ gksudo cp /usr/share/X11/locale/en_US.UTF-8/Compose /usr/share/X11/locale/en_US.UTF-8/Compose.ORIGINAL
$ gksudo cp /usr/share/X11/locale/el_GR.UTF-8/Compose /usr/share/X11/locale/en_US.UTF-8/Compose

The first command makes a backup copy of your original en_US Compose file (assuming you run an English locale; if in doubt, read /usr/share/X11/locale/locale.dir). The second command copies the Greek compose file over the English one. You then logout and login again.

End of updates

To write Greek Polytonic in Linux, a special file is used, which is called the compose file. There is a bit of complication here in the sense that the compose file depends on the current system locale.

To find out which compose file is active on your system, have a look at


Let’s assume your system locale is en_US.UTF-8 (Start Applications/Accessories/Terminal and type locale).

In the compose.dir file it says

en_US.UTF-8/Compose: en_US.UTF-8

Note that the locale is the second field. If you have a different system locale, match on the second field. Many people make a mistake here. Actually, I think be faster for the system to locate the entry if the compose.dir file was sorted by locale.

Therefore, the compose file is


So, what’s the problem then?

Well, for the Greek locale (el_GR.UTF-8) we have a different compose file, a compose file in which Greek Polytonic actually works ;-).

Therefore, there are numerous workarounds here to get Greek Polytonic working.

For example,

  • If you speak modern Greek, you can install the Greek locale.
  • You can edit /usr/share/X11/locale/compose.dir so that for your locale, the compose file is the Greek one, /usr/share/X11/locale/el_GR.UTF-8/Compose.
  • You can edit the Greek compose file, take the Greek Polytonic section and update the Greek Polytonic section of en_US.UTF-8/Compose.
  • You can copy the Greek compose file in your home directory under the name .XCompose. I did not try this one, and also you may be affected by this bug. (not tested)

Of course the proper solution is to update en_US.UTF-8/Compose with the updated Greek Polytonic compose sequences. There is a tendency to add the compose sequences of all languages to en_US.UTF-8/Compose, and this actually is happening now. In this respect, it would make sense to rename en_US.UTF-8/Compose into something like general/Compose.

Important MO file optimisation for en_* locales, and partly others

During GUADEC, Tomas Frydrych gave a talk on exmap-console, a cut-down version of exmap that can work well on mobile devices.

During the presentation, Tomas showed how to use the tool to find the culprits in memory (ab)use on the GNOME desktop. One issue that came up was that the MO files taking up space though the desktop showed English. Why would the MO translation files loaded in memory be so big in size?

gtk20.mo                             : VM   61440  B, M   61440  B, S   61440  B

atk10.mo                      	     : VM    8192  B, M    8192  B, S    8192  B

libgnome-2.0.mo			: VM   28672  B, M   24576  B, S   24576  B

glib20.mo			     : VM   20480  B, M   16384  B, S   16384  B

gtk20-properties.mo           : VM     128 KB, M     116 KB, S     116 KB

launchpad-integration.mo  : VM    4096  B, M    4096  B, S    4096  B

A translation file looks like

msgid “File”

msgstr “”

When translated to Greek it is

msgid “File”

msgstr “Αρχείο”

In the English UK translation it would be

msgid “File”

msgstr “File”

This actually is not necessary because if you leave those messags untranslated, the system will use the original messages that are embedded in the executable file.

However, for the purposes of the English UK, English Canadian, etc teams, it makes sense to copy the same messages in the translated field because it would be an indication that the message was examined by the translation. Any new messages would appear as untranslated and the same process would continue.

Now, the problem is that the gettext tools are not smart enough when they compile such translation files; they replicate without need those messages occupying space in the generated MO file.

Apart from the English variants, this issue is also present in other languages when the message looks like

msgid “GConf”

msgstr “GConf”

Here, it does not make much sense to translate the message in the locale language. However, the generated MO file contains now more than 10 bytes (5+5) , plus some space for the index.

Therefore, what’s the solution for this issue?

One solution is to add to msgattrib the option to preprocess a PO file and remove those unneeded copies. Here is a patch,

— src.ORIGINAL/msgattrib.c 2007-07-18 17:17:08.000000000 +0100
+++ src/msgattrib.c 2007-07-23 01:20:35.000000000 +0100
@@ -61,7 +61,8 @@
REMOVE_FUZZY = 1 << 2,
+ REMOVE_COPIED = 1 << 6
static int to_remove;

@@ -90,6 +91,7 @@
{ “help”, no_argument, NULL, ‘h’ },
{ “ignore-file”, required_argument, NULL, CHAR_MAX + 15 },
{ “indent”, no_argument, NULL, ‘i’ },
+ { “no-copied”, no_argument, NULL, CHAR_MAX + 19 },
{ “no-escape”, no_argument, NULL, ‘e’ },
{ “no-fuzzy”, no_argument, NULL, CHAR_MAX + 3 },
{ “no-location”, no_argument, &line_comment, 0 },
@@ -314,6 +316,10 @@
to_change |= REMOVE_PREV;

+ case CHAR_MAX + 19: /* –no-copied */
+ to_remove |= REMOVE_COPIED;
+ break;
@@ -436,6 +442,8 @@
–no-obsolete remove obsolete #~ messages\n”));
printf (_(“\
–only-obsolete keep obsolete #~ messages\n”));
+ printf (_(“\
+ –no-copied remove copied messages\n”));
printf (“\n”);
printf (_(“\
Attribute manipulation:\n”));
@@ -536,6 +544,21 @@
: to_remove & REMOVE_NONOBSOLETE))
return false;

+ if (to_remove & REMOVE_COPIED)
+ {
+ if (!strcmp(mp->msgid, mp->msgstr) && strlen(mp->msgstr)+1 >= mp->msgstr_len)
+ {
+ return false;
+ }
+ else if ( strlen(mp->msgstr)+1 < mp->msgstr_len )
+ {
+ if ( !strcmp(mp->msgstr + strlen(mp->msgstr)+1, mp->msgid_plural) )
+ {
+ return false;
+ }
+ }
+ }
return true;
However, if we only change msgattrib, we would need to adapt the build system for all packages.

Apparently, it would make sense to change the default behaviour of msgfmt, the program that compiles PO files into MO files.

An e-mail was sent to the email address for the development team of gettext regarding the issue. The development team does not appear to have a Bugzilla to record these issues. If you know of an alternative contact point, please notify me.

Update #1 (23Jul07): As an indication of the file size savings, the en_GB locale on Ubuntu in the installation CD occupies about 424KB where in practice it should have been 48KB.

A full installation of Ubuntu with some basic KDE packages (only for the basic libraries, i.e. KBabel – (ls k* | wc -l = 499)) occupies about 26MB of space just for the translation files. When optimising in the MO files, the translation files occupy only 7MB. This is quite important because when someone installs for example the en_CA locale, all en_?? locales are added.

The reason why the reduction is more has to do with the message types that KDE uses. For example,

msgid “”
“_: Unknown State\n”
msgstr “Unknown”

I cannot see a portable way to code the gettext-tools so that they understand that the above message can be easily omitted. For the above reduction to 7MB, KDE applications (k*) occupy 3.6MB. The non-KDE applications include GNOME, XFCE and GNU traditional tools. The biggest culprits in KDE are kstars (386KB) and kgeography (345KB).

Update #2 (23Jul07): (Thanks Deniz for the comment below on gweather!) The po-locations translations (gnome-applets/gweather) of all languages are combined together to generate a big XML file that can be found at usr/share/gnome-applets/gweather/Locations.xml (~15MB).

This file is not kept in memory while the gweather applet is running.
However, the file is parsed when the user opens the properties dialog to change the location.
I would say that the main problem here is the file size (15.8MB) that can be easily reduced when stripping copied messages. This file is included in any Linux distribution, whatever the locale.

The po-locations directory currently occupies 107MB and when copied messages are eliminated it occupies 78MB (a difference of 30MB). The generated XML file is in any case smaller (15.8MB without optimisation) because it does not include repeatedly the msgid lines for each language.

I regenerated the Locations.xml file with the optimised PO files and the resulting file is 7.6MB. This is a good reduction in file space and also in packaging size.

Update #3 (25Jul07): Posted a patch for gettext-tools/msgattrib.c. Sent an e-mail to the kde-i18n-doc mailing list and got good response and a valid argument for the proposed changes. Specifically, there is a case when one gives custom values to the LANGUAGE variable. This happens when someone uses the LANGUAGE variable with a value such as “es:fr” which means show me messages in Spanish and if something is untranslated show me in French. If a message has msgid==msgstr for Spanish but not for French, then it would show in French if we go along with the proposed optimisation.


(see http://www.guadec.org/schedule/warmup)

At the first presentation, Quim Gil talked about GNOME marketing, what have been done, what is the goal of marketing. He showed a focused mind on important marketing tasks; it is easy to get carried away and not be effective, a mistake that happens in several projects.

The next session was by Tomas Frydrych (Open Hand – I have their sticker on my laptop!) on memory use in GNOME applications. Many people complain that XYZ is bloated. However, this does not convey what exactly happens; pretty useless. In addition, the common tools that show memory use do not show the proper picture because of the memory management techniques. That is, due to shared libraries, the total memory occupied by an application appears very big. A tool examined is exmap. This tool uses a kernel module that shows memory use of applications by reading in /proc. It takes a snapshot of memory use; it’s not real-time info. It comes with a GTK+ front-end (gexmap) that requires a big screen (oops, PDAs). However, it is not suitable for internet tablets and other low-spec devices. Therefore, they came up with exmap-console which addresses the shortcommings. It has a console interface based on the readline library.

Here are the rest of my notes. Hope they make sense to you.

. exmap –interactive
. ?: help
. Head: quite useful (dynamic allocation)
. Mapped:
. Sole use: memory that app is using on its own (rss?)
. “sort vm”
. “print” or “p”
. “add nautilus”
. “clear”
. “detail file” (what executables/libs loaded and how much consume)
. “detail none”

Sole use
. valgrind, to analyse Sole Use memory?
. “detail ????”

Lots of small libraries: overhead

Looking ahead
. Pagemap: by Matt Macall
. http://projects.o-hand.com/exmap-console/

. Sole use: ~18MB ;-(

Tomas was apparently running Ubuntu with the English UK locale. The English UK translation team is doing an amazing job at the translation stats. Actually, most messages are copied, however with a script one can pick up words such as organization and change to organisation. The problem here is that, for example, the GAIM mo file is 215KB (?), however for the British English translation the actual changes should be less than 2-3KB. Messages that are missing from a translation mean that the original US English messages will be used. I’ll have to find how to use msgfilter to make messages untranslated if msgid == msgstr. Where is Danilo?

After lunch time (did not go for lunch), I went to the Accerciser session. Pretty cool tool, something I have been look for. Accerciser uses the accessibility framework of GNOME in order to inspect the windows of running applications and see into the properties. A good use is to identify if elements such as text boxes come with description labels; they are important to be there for accessibility purposes (screen reader), as a person that depends on software to read (text to speech) the contents of windows.

The next session was GNOME accessibility for blind people. Jan Buchal gave an excellent presentation.

My notes,

. is from Chech republic, is blind himself. has been using computers for 20+ years

. from user perspective
. users, regular and irregular 😉
. software
. firefox 3.0beta – ok for accessibility other versions no
. gaim messenger ok
. openoffice.org ok but did not try
. orca screenreader ^^^ works ok.
. generally ready for prime time
. ubuntu guy for accessibility was there
. made joke about not having/needing display slides ;-]
. synthesizer: festival, espeak, etc – can choose
. availability of voices
. javascript: not good for accessibility
. links/w3m: just fine!
. firefox3 makes accessibility now possible.
. web designer education, things like title=””, alt=”” for images.
. OOo, not installed but should work, ooo-gnome
. “braillcom” company name
. “speech dispatcher”
. logical events
. have short sound event instead of “button”, “input form”
. another special sound for emacs prompt, etc.
. uses emacs
. have all events spoken, such as application crashing.
. problems of accessibility
. not money main factor, but still exists.
. standard developers do not use accessibility functions
. “accessor” talk, can help
. small developer group on accessiblity, may not cooperate well
. non-regular users (such as blind musician)
. musicians
. project “singing computer”
. gtk, did not have good infrastructure
. used lilypond (music typesetter, good but not simple to use)
. singing mode in festival
. use emacs with special mode to write music scores (?)
. write music score and have the computer sing it (this is not “caruso”)
. gnome interface for lilypond would be interesting
. chemistry for blind
. gtk+
. considering it
. must also work, unfortunately, on windows
. gtk+ for windows, not so good for accessibility
. conclusion: free accessibility
. need users so that applications can be improved
. have festival synthesizer, not perfect but usable
. many languages, hindi, finnish, afrikaans
. endinburgh project, to reimplement festival better
. proprietary software is a disadvantage
. q: how do you learn to use new software?
. a: has been a computer user for 20+ years, is not good candidate to say
. a: if you are dedicated, you can bypass hardles, old lady emacs/festival/lilypond
. brrlcom, not for end-users(?)
. developer problem?
. generally there is lack of documentation; easy to teach what a developer needs to know
. so that the application is accessible
. HIG Human Interface Guidelines, accessible to the developers
. “speakup” project
. Willy, from Sun microsystems, working on accessibility for +20 years, Lead of Orca.
. developers: feel accessibility is a hindrance to development
. in practice the gap is not huge
. get tools (glade) and gtk+ to come with accessibility on by default
. accessibility
. is not only for people with disabilities
. can do amazing things like 3d interfaces something

These summaries are an important example of the rule that during presentation, participants tend to remember only about 8% of the material. In some examples, even less is being recollected.