<?xml version=”1.0″ encoding=”utf-8″?>
<!DOCTYPE article PUBLIC “-//OASIS//DTD DocBook XML V4.2//EN”
This is an issue that I would appreciate if someone could help in solving.
The above document (mytestfile.xml) is a DocBook XML document with text in many scripts (latin, cyrillic and greek). Normally it was difficult to convert to PDF, until recently.
Now, one can run
dblatex --backend=xetex --verbose mytestfile.xml
(requires to install the dblatex package and any dependencies) and it creates mytestfile.pdf. If you have a fresh installation of Ubuntu 8.10 and you go through the process of installing these packages, please make a list of them, to use as advice for new users.
Since we use XeTeX as a backend, we can work with Unicode text directly, which is the proper thing to do. Above you can see that all characters are shown (except a few obscure ones that are not found in DejaVu Sans and are shown as boxes). You can see Latin (+Extended), Cyrillic (+Extended), Greek (+Extended) in the same document.
The issue arises when we change the lang modifier in the document above, from en to el. Here you see Τιτλε, which in fact is Title but with the characters replaced with their Greek equivalent. This is a sign for non-Unicode, 8-bit encoding conversion issue. In addition, some of the rest of the characters are shown, and apparently a strange conversion took place.
What we need to do is figure out is how to fix xetex when ‘lang=el’. There is some work to get Greek XeTeX support upstream, and there are instructions on how to add local Greek XeTeX support in your distribution.
What we need is instructions on how to fix the Greek XeTeX support in Ubuntu 8.10, and test that dblatex can generate documents correctly when lang=el.
For your testing, here are the files mytestfile-en.pdf, mytestfile-el.pdf, mytestfile-en.xml, mytestfile-el.xml.
Eγω Oasis εψαχνα βρε μαγκες. You know… http://www.discoogle.com/wiki/Oasis