Tag : proj

Μετατροπή συντεταγμένων μεταξύ WGS 84 (π.χ. GPS) και ΕΓΣΑ87 (GGRS87, GR87, ελληνικό σύστημα)

Οι συντεταγμένες που γνωρίζουμε από τους χάρτες και τα συστήματα GPS ακολουθούν το πρότυπο WGS 84.

Στην Ελλάδα σε κάποιες υπηρεσίες γίνεται χρήση του ΕΓΣΑ87 (GGRS87).

Πως κάνουμε τη μετατροπή μεταξύ WGS 84 και ΕΓΣΑ87 (GGRS87, GR87);

  1. Επιβεβαιώνουμε ότι έχουμε εγκαταστήσει το πακέτο proj (sudo apt-get install proj).
  2. Για το proj, ο κωδικός για WGS84 είναι epsg:4326 και για ΕΓΣΑ87 είναι epsg:2100, οπότε
    1. cs2cs +init=epsg:4326 +to +init=epsg:2100
    2. cs2cs +init=epsg:2100 +to +init=epsg:4326

Για παράδειγμα, για μετατροπή των συντεταγμένων WGS 84 για την πλατεία Ομονοίας σε ΕΓΣΑ87,

$ echo "23.72826 37.98414" | cs2cs +init=epsg:4326 +to +init=epsg:2100
475987.12    4203802.23 -31.28
$ _

Στα έντυπα οι παραπάνω συντεταγμένες γράφονται τυπικά ως

 004 75 987Α
 042 03 802Β

Υπάρχουν Python bindings για το πακέτο proj, με το πακέτο python-pyproj. Είναι εφικτό να γράψει κάποιος μια εφαρμογή γραφικού περιβάλλοντος σε Python που να επιτρέπει την εύκολη μετατροπή συντεταγμένων μεταξύ των δύο συστημάτων. Από αναζήτησή στο Διαδίκτυο διαπίστωσα ότι δεν υπάρχει λογισμικό γραφικού περιβάλλοντος που να μην είναι shareware ή trialware για τη μετατροπή αυτή.

Πηγή το άρθρο του Αντώνη Χριστοφίδη «Introduction to geographical co-ordinate systems» και επικοινωνία με τον Στέφανο Κοζάνη (βιβλιοθήκη icoordtrans, public domain).

Playing with Git

Git is a version control system (VCS) software that is used for source code management (SCM). There are several examples of VCS software, such as CVS and SVN. What makes Git different is that it is a distributed VCS, that is, a DVCS.

Being a DVCS, when you use Git you create fully capable local repositories that can be used for offline work. When you get the files of a repository, you actually grab the full information (this makes the initial creation of local repositories out of a remote repository slower, and the repositories are bigger).

You can install git by installing the git package. You can test it by opening a terminal window, and running

git clone git://github.com/schacon/whygitisbetter.git

The files appear in a directory called whygitisbetter. In a subdirectory called .git/,git stores all the controlling information it requires to manage the local repository. When you enter the repository directory (whygitisbetter in our case), you can issue commands that will figure out what’s going on because of the info in .git/.

With git, we create local copies of repositories by cloning. If you have used CVS or SVN, this is somewhat equivalent to the checkout command. By cloning, you create a full local repository. When you checkout with CVS or SVN, you get the latest snapshot only of the source code.

What you downloaded above is the source code for the http://www.whygitisbetterthanx.com/ website. It describes the relative advantages of git compared to other VCS and DVCS systems.

Among the different sources of documentation for git, I think one of the easiest to read is the Git Community Book. It is consise and easy to follow, and it comes with video casting (videos that show different tasks, with audio guidance).

You can create local repositories on your system. If you want to have a remote repository, you can create an account at GitHub, an attractive start-up that offers 100MB free space for your git repository. Therefore, you can host your pet project on github quite easily.

GitHub combines source code management with social networking, no matter how strange that may look like. It comes with tools that allows to maintain your own copies of repositories (for example, from other github users), and helps with the communication. For example, if I create my own copy of the whygitisbetter repository and add something nice to the book, I can send a pull request (with the click of a button) to the maintainer to grab my changes!

If you have already used another SCM tool (non-distributed), it takes some time to get used to the new way of git. It is a good skill to have, and the effort should pay off quickly. There is a SVN to Git crash course available.

If you have never used an SCM, it is cool to go for git. There is nothing to unlearn, and you will get a new skill.

Git is used for the developement of the Linux kernel, the Perl language, Ruby On Rails, and others.

What to see in OpenOffice.org 3.1 Writer;

At the User Experience mailing list at OpenOffice.org there is a thread to discuss & plan what new things should make it to OpenOffice.org 3.1. Here is the first email,

From: Christian Jansen

Date: Wed, Jun 25, 2008 at 8:35 AM

Hi,
OpenOffice.org 3.1 planning will start soon, thus I’d like to collect some ideas (from an UX-perspective) for improving OpenOffice.org Writer.

What drives you nuts?

What works pretty well?

My most wanted feature is:

Thanks for your feedback!

Regards,
Christian [Jansen]

Source: http://ux.openoffice.org/servlets/ReadMsg?list=discuss&msgNo=1890

Feedback started pouring in. Make your way to the user-experience discuss@ux.openoffice.org mailing list to add your views. When you log in to OpenOffice.org, you click to subscribe to the mailing list. That is, you need to make an account first.

ANTLR grammar for XKB, and Relax NG schema (draft)

I completed the ANTLRv3 grammar for symbols/ configuration files of XKB. The grammar can parse and create the abstract syntax tree (AST) for all keyboard layouts in xkeyboard-config.

ANTLRv3 helps you create parsers for domain specific languages (DSL), an example of which is the configuration files in XKB.

Having the ANTLRv3 grammar for a configuration file allows to generate code in any of the supported target lagnuages (C, C++, Java, Python, C#, etc), so that you easily include a parser that reads those files. Essentially you avoid using custom parsers which can be difficult to maintain, or parsers that were generated with flex/bison.

On a similar note, here is the grammar to parse Compose files (such as en_US.UTF-8/Compose.pre). I am not going to be using in the project for now, but it was fun writing it. The Python target takes 18s to create the AST for the >5500 lines of the en_US.UTF-8 compose file, on a typical modern laptop.

I am also working on creating a RelaxNG schema for the XKB configuration files (those under symbols/). There is a draft available, which needs much more work.The Relax NG book by Eric van de Vlist is very useful here.

The immediate goal is to use the code generated by ANTLR to parse the XKB files and create XML files based on the Relax NG schema. I am using Python, and there are a few options; the libxml2 bindings for Python, and PyXML. The latter has more visible documentation, but I think that I should better be using the former.

Update: lxml appears to be the nice way to use libxml2 (instead of using directly libxml2).

Looking into the symbol files

In the previous post, we talked about the ANTLR grammar that parses the XKB layout files.

The grammar is available at http://code.google.com/p/keyboardlayouteditor/source/browse. I’ll rather push to the freedesktop repository once the project is completed. Now it’s too easy for me, just doing svn commit -m something.

Below you can see the relevant layout files for each country (and in some cases, language), and how the grammar deals with them. First column is filenames from the CVS XKB symbols subdirectory (to be moved eminently to GIT). Last’s week discussion with Sergey helped me figure out issues with the symbol files, simplify what information is needed, and what can be eliminated. Second column has Not OK if something is wrong. Third column tries to explain what was wrong.

ad
af
al
altwin
am
ara
az
ba
bd
be
bg
br
braille
bt
by
ca
capslock
cd
ch
cn
compose
ctrl
cz
de
dk
ee
epo
es
et
eurosign
fi
fo
fr
gb NOK Non-UTF8
ge
gh
gn
gr
group NOK virtualMods= AltGr
hr
hu NOK Non-UTF8
ie
il NOK key.type=”FOUR_LEVEL” (typically: key.type[something]=….)
in NOK key.type=”FOUR_LEVEL” (typically: key.type[something]=….)
inet
iq
ir
is
it
jp NOK key <BKSP> {
type=””,   // empty?
symbols[Group1]= [ bracketright, braceright ]
};
keypad NOK overlay1=<KO7> }; // what’s “overlay”?
kg
kh
kpdl
kr
kz
la
latam
latin
level3 NOK virtual_modifiers LAlt, AlGr; virtualMods= Lalt
level5
lk
lt
lv
ma
mao
me
mk
mm
mn
mt
mv
nbsp NOK Non-UTF8
ng
nl
no
np
olpc
pc NOK key <AA00> { type=”SOMETHING” } instead of { type[Group1]=”SOMETHING” }
pk
pl
pt
ro
rs
ru
se
shift NOK actions [Group1] = [
si
sk
srvr_ctrl NOK key <AA00> { type=”SOMETHING” } instead of { type[Group1]=”SOMETHING” }
sy
th
tj
tr
ua

Non-UTF-8 are the files that have characters that are not UTF-8 (are iso-8859-1).

Some layouts have key.type = “something” and others key.type[SomeGroup] = “something”. Apparently, the format allows to infer which is the group that the type acts upon? That’s weird. Would it be better to put the group information? Is it required that the group is not set?

Some files have virtualMods, which I do not know what it is. Is it used?

Keyboard Layout Editor GSOC project

I got accepted for a GSOC project with the X.Org Foundation. My mentor is Sergey Udaltsov and I look forward working with him.

The project is about creating a Keyboard Layout Editor, that can be used to edit XKB files with a nice GUI.

I will be blogging about these from here (fdo category at this blog).

FOSDEM ’08, summary and comments

I attended FOSDEM ’08 which took place on the 23rd and 24th of February in Brussels.

Compared to other events, FOSDEM is a big event with over 4000 (?) participants and over 200 lectures (from lightning talks to keynotes). It occupied three buildings at a local university. Many sessions were taking place at the same time and you had to switch from one room to another. What follows is what I remember from the talks. Remember, people recollect <8% of the material they hear in a talk.

The first keynote was by Robin Rowe and Gabrielle Pantera, on using Linux in the motion picture industry. They showed a huge list of movies that were created using Linux farms. The first big item in the list was the movie Titanic (1997). The list stopped at around 2005 and the reason was that since then any significant movie that employs digital editing or 3D animation is created on Linux systems. They showed trailers from popular movies and explained how technology advanced to create realistic scenes. Part of being realistic, a generated scene may need to be blurred so that it does not look too crisp.

Next, Robert Watson gave a keynote on FreeBSD and the development community. He explained lots of things from the community that someone who is not using the distribution does not know about. FreeBSD apparently has a close-knit community, with people having specific roles. To become a developer, you go through a structured mentoring process which is great. I did not see such structured approach described in other open-source projects.

Pieter Hintjens, the former president of the FFII, talked about software patents. Software patents are bad because they describe ideas and not some concrete invention. This has been the view so that the target of the FFII effort fits on software patents. However, Pieter thinks that patents in general are bad, and it would be good to push this idea.

CMake is a build system, similar to what one gets with automake/autoconf/makefile. I have not seen this project before, and from what I saw, they look quite ambitious. Apparently it is very easy to get your compilation results on the web when you use CMake. In order to make their project more visible, they should make effort on migration of existing projects to using CMake. I did not see yet a major open-source package being developed with CMake, apart from CMake itself.

Richard Hughes talked about PackageKit, a layer that removes the complexity of packaging systems. You have GNOME and your distribution is either Debian, Ubuntu, Fedora or something else. PackageKit allows to have a common interface, and simplifies the workflow of managing the installation of packages and the updates.

In the Virtualisation tracks, two talks were really amazing. Xen and VirtualBox. Virtualisation is hot property and both companies were bought recently by Citrix and Sun Microsystems respectively. Xen is a Type 1 (native, bare metal) hypervisor while VirtualBox is a Type 2 (hosted) hypervisor. You would typically use Xen if you want to supply different services on a fast server. VirtualBox is amazingly good when you want to have a desktop running on your computer.

Ian Pratt (Xen) explained well the advantages of using a hypervisor, going into many details. For example, if you have a service that is single-threaded, then it makes sense to use Xen and install it on a dual-core system. Then, you can install some other services on the same system, increasing the utilisation of your investment.

Achim Hasenmueller gave an amazing talk. He started with a joke; I have recently been demoted. From CEO to head of virtualisation department (name?) at Sun Microsystems. He walked through the audience on the steps of his company. The first virtualisation product of his company was sold to Connectix, which then was sold to Microsoft as VirtualPC. Around 2005, he started a new company, Innotek and the product VirtualBox. The first customers were government agencies in Germany and only recently (2007) they started selling to end-users.

Virtualisation is quite complex, and it becomes more complex if your offering is cross platform. They manage the complexity by making VirtualBox modular.

VirtualBox comes in two versions; an open-source version and a binary edition. The difference is that with the binary edition you get USB support and you can use RDP to access the host. If you installed VirtualBox from the repository of your distribution, there is no USB support. He did not commit whether the USB/RDP support would make it to the open-source version, though it might happen since Sun Microsystems bought the company. I think that if enough people request it, then it might happen.

VirtualBox uses QT 3.3 as the cross platform toolkit, and there is a plan to migrate to QT 4.0. GTK+ was considered, though it was not chosen because it does not provide yet good support in Win32 (applications do not look very native on Windows). wxWidgets were considered as well, but also rejected. Apparently, moving from QT 3.3 to QT 4.0 is a lot of effort.

Zeeshan Ali demonstrated GUPnP, a library that allows applications to use the UPnP (Universal Plug n Play) protocol. This protocol is used when your computer tells your ADSL model to open a port so that an external computer can communicate directly with you (bypassing firewall/NAT). UPnP can also be used to access the content of your media station. The gupnp library comes with two interesting tools; gupnp-universal-cp and gupnp-network-light. The first is a browser of UPnP devices; it can show you what devices are available, what functionality they export, and you can control said devices. For example, you can use GUPnP to open a port on your router; when someone connects from the Internet to port 22 on your modem, he is redirected to your server, at port 22.

You can also use the same tool to figure out what port mapping took place already on your modem.

The demo with the network light is that you run the browser on one computer and the network light on another, both on the local LAN (this thing works only on the local LAN). Then, you can use the browser to switch on/off the light using the UPnP protocol.

Dimitris Glezos gave a talk on transifex, the translation management framework that is currently used in Fedora. Translating software is a tedious task, and currently translators spent time on management tasks that have little to do with translation. We see several people dropping from translations due to this. Transifex is an evolving platform to make the work of the translator easier.

Dimitris talked about a command-line version of transifex coming out soon. Apparently, you can use this tool to grab the Greek translation of package gedit, branch HEAD. Do the translation and upload back the file.

What I would like to see here is a tool that you can instruct it to grab all PO files from a collection of projects (such as GNOME 2.22, UI Translations), and then you translate with your scripts/tools/etc. Then, you can use transifex to upload all those files using your SVN account.

The workflow would be something like

$ tfx --project=gnome-2.22 --collection=gnome-desktop --action=get
Reading from http://svn.gnome.org/svn/damned-lies/trunk/releases.xml.in... done.
Getting alacarte... done.
Getting bug-buddy... done.
...
Completed in 4:11s.
$ _

Now we translate any of the files we downloaded, and we push back upstream (of course, only those files that were changed).

$ tfx --project=gnome-2.22 --collection=gnome-desktop --user=simos --action=send
 Reading local files...
Found 6 changed files.
Uploading alacarte... done.
...
Completed uploading translation files to gnome-2.22.
$ _

Berend Cornelius talked about creating OpenOffice.org Wizards. You get such wizards when you click on File/Wizards…, and you can use them to fill in entries in a template document (such as your name, address, etc in a letter), or use to install the spellchecker files. Actually, one of the most common uses is to get those spellchecker files installed.

A wizard is actually an OpenOffice.org extension; once you write it and install it (Tools/Extensions…), you can have it appear as a button on a toolbar or a menu item among other menus.

You write wizards in C++, and one would normally work on an existing wizard as base for new ones.

When people type in a word-processor, they typically abuse it (that’s my statement, not Berend’s) by omitting the use of styles and formatting. This makes documents difficult to maintain. Having a wizard teach a new user how to write a structured document would be a good idea.

Perry Ismangil talked about pjsip, the portable open-source SIP and media stack. This means that you can have Internet telephony on different devices. Considering that Internet Telephony is a commodity, this is very cool. He demonstrated pjsip running two small devices, a Nintendo DS and an iPhone. Apparently pjsip can go on your OpenWRT router as well, giving you many more exciting opportunities.

Clutter is a library to create fast animations and other effects on the GNOME desktop. It uses hardware acceleration to make up for the speed. You don’t need to learn OpenGL stuff; Clutter is there to provide the glue.

Gutsy has Clutter 0.4.0 in the repositories and the latest version is 0.6.0. To try out, you need at least the clutter tarball from the Clutter website. To start programming for your desktop, you need to try some of the bindings packages.

I had the chance to spend time with the DejaVu guys (Hi Denis, Ben!). Also met up with Alexios, Dimitris x2, Serafeim, Markos and others from the Greek mission.

Overall, FOSDEM is a cool event. In two days there is so much material and interesting talks. It’s a recommended technical event.

The Google Highly Open Participation Contest

One more initiative by Google to reach out to the community and promote free and open-source software is the The Google Highly Open Participation Contest 2007/2008.

The purpose of the competition is to enable young students older than 13 years old but have not entered yet the tertiary education, to participate in open-source development.

To get started, read the Official Contest Rules and the Frequently Asked Questions (FAQ) pages.

There are ten projects to choose to work from, one of which is GNOME, the desktop environment found in Linux distributions such as Ubuntu Linux and Fedora.

The current list of items to work on for GNOME include several documentation and translation tasks. If there is interest to work on the Greek localisation, leave a comment at this post. The direction I propose is to help with translating the documentation of GNOME applications.

Open-source software progresses by having more people contributing. This effort by Google and also previous efforts (Google Summer of Code) help tremendously towards the wider participation.

StixFonts, finally available (beta)!

The STIX Fonts project (website) has been developing for over 10 years a font suitable to be used in academic publications. It boasts support from Elsevier, IEEE and other academic publishers or associations.

A few days ago, they published a beta version of the font in an effort to get public feedback. The beta period runs until the 15th December.

STIX Fonts Beta showing Greek (Regular), from STIX Fonts Beta

STIX Fonts Beta currently support modern Greek. An effort to get support for Greek Polytonic did not work out well a few years back.

STIX Fonts Beta showing Greek (Italic), from STIX Fonts Beta

The main benefit of STIX Fonts is the support for mathematical and other technical symbols. This helps when writing academic publications and other technical documents.

STIX Fonts Beta showing Greek (Bold), from STIX Fonts Beta

STIX Fonts have extensive support of mathematical symbols, symbols that exist in Unicode Plane-1.

STIX Fonts Beta showing Greek (Bold Italic), from STIX Fonts Beta

If there is any modification that we would like to have in STIX fonts, we should do now. Once they are released, they will be widely distributed. Currently, Fedora has packaged STIX Fonts and made them available already.

One-line hardware support (USB Wireless Adapter)

I got recently a USB Wireless Adaptor, produced by Aztech. It was a good buy for several reasons:

  • It advertised Linux support
  • It was affordable
  • It had good quality casing; you can step on it and it won’t break
  • It had the Penguin on the box and was really really cheap

When I plugged it in on my Linux system, it did not work out of the box. The kernel acknowledged that a USB device was inserted (two lines in /var/log/messages) but no driver claimed the device.

With the package came a CD which had drivers for several operating systems, including Linux. Apparently one would need to install the specific driver. I think the driver was available in both source code and as a binary package (for some kernel version).

The kernel module on the CD was called zd1211, so I checked whether my kernel had such a module installed. To my surprise, there was such a kernel module, called zd1211rw. I hope you have better chance with the URL because now the website appears to be down (Error 500).

Therefore, what was wrong with my zd1211rw kernel module? Reading the documentation of project website, I figured out that you have to report the ID (called the USB ID) of your adapter  so that it is included in the kernel module, and when you plug in your device, it will be automatically detected.

You can find the USB ID by running the command lsusb. Then, it is a one-line patch for the zd1211rw driver to add support for the device,

— zd1211rw.linux2.6.20/zd_usb.c      2007-09-25 14:48:06.000000000
+0300
+++ zd1211rw/zd_usb.c    2007-09-28 11:35:51.000000000 +0300
@@ -64,6 +64,7 @@
{ USB_DEVICE(0x13b1, 0x0024), .driver_info = DEVICE_ZD1211B },
{ USB_DEVICE(0x0586, 0x340f), .driver_info = DEVICE_ZD1211B },
{ USB_DEVICE(0x0baf, 0x0121), .driver_info = DEVICE_ZD1211B },
+       { USB_DEVICE(0x0cde, 0x001a), .driver_info = DEVICE_ZD1211B },
/* “Driverless” devices that need ejecting */
{ USB_DEVICE(0x0ace, 0x2011), .driver_info = DEVICE_INSTALLER },
{ USB_DEVICE(0x0ace, 0x20ff), .driver_info = DEVICE_INSTALLER },

What Aztech should have done is to submit the USB ID to the developers of the zd1211rw driver. In this way, any Linux distribution that comes out with the updated kernel will have support for the device.

It is very important to get the manufacturers to change mentality. From offering a CD with “drivers”, for free and open-source software they should also work upstream with the device driver developers of the Linux kernel. The effort is small and the customer benefits huge.

Greek OLPC localisation status

The Greek OLPC localisation effort is ongoing and here is a report of the current status.

For discussions, reading discussion archives and commenting, please see the Greek OLPC Discussion Group.

We are localising two components, the UI (User Interface) and applications of the OLPC, and the main website at http://www.laptop.org/

The UI is currently being translated at the OLPC Wiki, at OLPC_Greece/Translation. At this page you can see the currently available packages, what is pending and which is the page that you also can help translate.

At this stage we need people with skills in music terminology to help out with the localisation of TamTam. In addition, there are more translations that need review and comments before they are sent upstream.

Moreover, if you find a typo and a better suggestion for a term in the submitted translations, feel free to tell us at the Greek OLPC Discussion Group.

The other project we are working on is the localisation of the Greek version of www.laptop.org. The pages are not 100% translated yet, so if you want to finish the difficult parts, see the Web translation page of laptop.org.

The translators that helped up to now have done an amazing job.

Important MO file optimisation for en_* locales, and partly others

During GUADEC, Tomas Frydrych gave a talk on exmap-console, a cut-down version of exmap that can work well on mobile devices.

During the presentation, Tomas showed how to use the tool to find the culprits in memory (ab)use on the GNOME desktop. One issue that came up was that the MO files taking up space though the desktop showed English. Why would the MO translation files loaded in memory be so big in size?

gtk20.mo                             : VM   61440  B, M   61440  B, S   61440  B

atk10.mo                      	     : VM    8192  B, M    8192  B, S    8192  B

libgnome-2.0.mo			: VM   28672  B, M   24576  B, S   24576  B

glib20.mo			     : VM   20480  B, M   16384  B, S   16384  B

gtk20-properties.mo           : VM     128 KB, M     116 KB, S     116 KB

launchpad-integration.mo  : VM    4096  B, M    4096  B, S    4096  B

A translation file looks like

msgid “File”

msgstr “”

When translated to Greek it is

msgid “File”

msgstr “Αρχείο”

In the English UK translation it would be

msgid “File”

msgstr “File”

This actually is not necessary because if you leave those messags untranslated, the system will use the original messages that are embedded in the executable file.

However, for the purposes of the English UK, English Canadian, etc teams, it makes sense to copy the same messages in the translated field because it would be an indication that the message was examined by the translation. Any new messages would appear as untranslated and the same process would continue.

Now, the problem is that the gettext tools are not smart enough when they compile such translation files; they replicate without need those messages occupying space in the generated MO file.

Apart from the English variants, this issue is also present in other languages when the message looks like

msgid “GConf”

msgstr “GConf”

Here, it does not make much sense to translate the message in the locale language. However, the generated MO file contains now more than 10 bytes (5+5) , plus some space for the index.

Therefore, what’s the solution for this issue?

One solution is to add to msgattrib the option to preprocess a PO file and remove those unneeded copies. Here is a patch,

— src.ORIGINAL/msgattrib.c 2007-07-18 17:17:08.000000000 +0100
+++ src/msgattrib.c 2007-07-23 01:20:35.000000000 +0100
@@ -61,7 +61,8 @@
REMOVE_FUZZY = 1 << 2,
REMOVE_NONFUZZY = 1 << 3,
REMOVE_OBSOLETE = 1 << 4,
– REMOVE_NONOBSOLETE = 1 << 5
+ REMOVE_NONOBSOLETE = 1 << 5,
+ REMOVE_COPIED = 1 << 6
};
static int to_remove;

@@ -90,6 +91,7 @@
{ “help”, no_argument, NULL, ‘h’ },
{ “ignore-file”, required_argument, NULL, CHAR_MAX + 15 },
{ “indent”, no_argument, NULL, ‘i’ },
+ { “no-copied”, no_argument, NULL, CHAR_MAX + 19 },
{ “no-escape”, no_argument, NULL, ‘e’ },
{ “no-fuzzy”, no_argument, NULL, CHAR_MAX + 3 },
{ “no-location”, no_argument, &line_comment, 0 },
@@ -314,6 +316,10 @@
to_change |= REMOVE_PREV;
break;

+ case CHAR_MAX + 19: /* –no-copied */
+ to_remove |= REMOVE_COPIED;
+ break;
+
default:
usage (EXIT_FAILURE);
/* NOTREACHED */
@@ -436,6 +442,8 @@
–no-obsolete remove obsolete #~ messages\n”));
printf (_(“\
–only-obsolete keep obsolete #~ messages\n”));
+ printf (_(“\
+ –no-copied remove copied messages\n”));
printf (“\n”);
printf (_(“\
Attribute manipulation:\n”));
@@ -536,6 +544,21 @@
: to_remove & REMOVE_NONOBSOLETE))
return false;

+ if (to_remove & REMOVE_COPIED)
+ {
+ if (!strcmp(mp->msgid, mp->msgstr) && strlen(mp->msgstr)+1 >= mp->msgstr_len)
+ {
+ return false;
+ }
+ else if ( strlen(mp->msgstr)+1 < mp->msgstr_len )
+ {
+ if ( !strcmp(mp->msgstr + strlen(mp->msgstr)+1, mp->msgid_plural) )
+ {
+ return false;
+ }
+ }
+ }
+
return true;
}
However, if we only change msgattrib, we would need to adapt the build system for all packages.

Apparently, it would make sense to change the default behaviour of msgfmt, the program that compiles PO files into MO files.

An e-mail was sent to the email address for the development team of gettext regarding the issue. The development team does not appear to have a Bugzilla to record these issues. If you know of an alternative contact point, please notify me.

Update #1 (23Jul07): As an indication of the file size savings, the en_GB locale on Ubuntu in the installation CD occupies about 424KB where in practice it should have been 48KB.

A full installation of Ubuntu with some basic KDE packages (only for the basic libraries, i.e. KBabel – (ls k* | wc -l = 499)) occupies about 26MB of space just for the translation files. When optimising in the MO files, the translation files occupy only 7MB. This is quite important because when someone installs for example the en_CA locale, all en_?? locales are added.

The reason why the reduction is more has to do with the message types that KDE uses. For example,

msgid “”
“_: Unknown State\n”
“Unknown”
msgstr “Unknown”

I cannot see a portable way to code the gettext-tools so that they understand that the above message can be easily omitted. For the above reduction to 7MB, KDE applications (k*) occupy 3.6MB. The non-KDE applications include GNOME, XFCE and GNU traditional tools. The biggest culprits in KDE are kstars (386KB) and kgeography (345KB).

Update #2 (23Jul07): (Thanks Deniz for the comment below on gweather!) The po-locations translations (gnome-applets/gweather) of all languages are combined together to generate a big XML file that can be found at usr/share/gnome-applets/gweather/Locations.xml (~15MB).

This file is not kept in memory while the gweather applet is running.
However, the file is parsed when the user opens the properties dialog to change the location.
I would say that the main problem here is the file size (15.8MB) that can be easily reduced when stripping copied messages. This file is included in any Linux distribution, whatever the locale.

The po-locations directory currently occupies 107MB and when copied messages are eliminated it occupies 78MB (a difference of 30MB). The generated XML file is in any case smaller (15.8MB without optimisation) because it does not include repeatedly the msgid lines for each language.

I regenerated the Locations.xml file with the optimised PO files and the resulting file is 7.6MB. This is a good reduction in file space and also in packaging size.

Update #3 (25Jul07): Posted a patch for gettext-tools/msgattrib.c. Sent an e-mail to the kde-i18n-doc mailing list and got good response and a valid argument for the proposed changes. Specifically, there is a case when one gives custom values to the LANGUAGE variable. This happens when someone uses the LANGUAGE variable with a value such as “es:fr” which means show me messages in Spanish and if something is untranslated show me in French. If a message has msgid==msgstr for Spanish but not for French, then it would show in French if we go along with the proposed optimisation.