[Koha-devel] any right to left specialists around ?
Gaetan Boisson
gaetan.boisson at biblibre.com
Thu Mar 28 19:25:06 CET 2013
Dear all,
a pretty long call for help, but please read on if you know anything
about handling right to left scripts, or if you're just interested in it.
I am working on a project in Iraqi Kurdistan on behalf of BibLibre and
have been reading quite a lot on the various problems typically
encountered with text that flows from right to left, and especially when
mingling text that flow in opposite directions in the same display.
To sum it up briefly, and as far as i understood, displaying the data
should be fine whenever there is only one run of text in an element.
Things get tricky when your run of text ends with a neutral character (a
character that doesn't have an inherent direction in which it should be
read, such as a punctuation mark) or when you have different directional
runs one after another (Imagine the arabic title of a book called "Learn
HTML4 in 24 lessons", where "HTML4" and "24" would be written in the
latin alphabet for instance. Also, reading about all this, i learnt that
arabic is written from right to left, except for numbers which are
written from left to right). In those cases, the "bidi algorythm" (the
thing in charge of displaying bi-directionnal text properly) will find
itself in an unclear situation and can make the wrong choices. This
results in things like (What you will see below here seems to depend on
what you use. I can tell you it's not displaying well in thunderbird
17.0.4 on Ubuntu) :
تعلم HTML4 فى 24 درسا
The words are the right ones, but their order is messed up. To an arabic
reading person what we have here doesn't make sense, it's like "in 24
lessons HTML4 Learn".
What happens here is that we have 3 directional runs (not 5 : فى 24 درسا
is just one run from left to right, with the numbers being read from
left to right as they should, but it's still the same run. Yes my mind
is crying too ;) )
تعلم
HTML4
فى 24 درسا
This is the order in which they display in my thunderbird, and they
should display in the reverse order. (I have saved a record with this
title in Koha and the display *is* messed up in a ltr interface, it's
fine if the interface is in arabic.) It seems thunderbird is messing
things up because it is ordering them according to its context, which is
left to right. But if you copy and paste the full line in a more simple
text editor such as Gedit, you will get the right order, unless you
start typing things in the latin alphabet at the start of the line
(which will be on the right side). Then you will have a messed up order,
aligned on the left.
Now, when we will migrate the data for this project i would like to take
all measures to make sure things will be understandable in all possible
contexts. That is whether the interface is displaying in a left to right
or right to left language should not matter.
There are unicode invisible characters that can be used to say "this
whole stretch is left to right" (or the opposite), or even some
characters which cannot be seen but which are "strongly typed" rtl or
ltr and can be inserted at the right place to clarify the context and
fix things.
What i am tempted to do is to enclose all strings in those characters
during the migration, according to the language used (an information i
can find elsewhere in my data). Or just add the "clarifying" character
at the end (or beginning ?) of the string. I guess it would be a good
idea then to do this in all cases, that is not just for rtl languages in
the data, but for both. (Even though chances are english books will have
much less mixed scripts in their metadata than their arabic
counterparts. This just seem like a justified equal treatment.) Is that
the right way to proceed ?
One last thing is that when cataloguing new documents, i guess the
librarians should pay attention to this. Should we then train them to
use these characters at the appropriate places ?
If you have experience with this and can provide some advice as to the
solutions usually put in place, i'd be extremely grateful if you could
share it. I intend to get a good look at that and put some effort into
fixing a few things on the way. ;)
Thanks,
If you read all this just out of interest, go to the W3C pages on bidi
text, they are extremely informative and well written :
http://www.w3.org/International/tutorials/bidi-xhtml/
--
Gaetan Boisson
Chef de projet bibliothécaire
BibLibre
06 52 42 51 29
108 avenue Breteuil 13006 Marseille
gaetan.boisson at biblibre.com
More information about the Koha-devel
mailing list