[Koha-devel] any right to left specialists around ?

Gaetan Boisson gaetan.boisson at biblibre.com
Thu Mar 28 19:25:06 CET 2013


  Dear all,

a pretty long call for help, but please read on if you know anything 
about handling right to left scripts, or if you're just interested in it.

I am working on a project in Iraqi Kurdistan on behalf of BibLibre and 
have been reading quite a lot on the various problems typically 
encountered with text that flows from right to left, and especially when 
mingling text that flow in opposite directions in the same display.

  To sum it up briefly, and as far as i understood, displaying the data 
should be fine whenever there is only one run of text in an element. 
Things get tricky when your run of text ends with a neutral character (a 
character that doesn't have an inherent direction in which it should be 
read, such as a punctuation mark) or when you have different directional 
runs one after another (Imagine the arabic title of a book called "Learn 
HTML4 in 24 lessons", where "HTML4" and "24" would be written in the 
latin alphabet for instance. Also, reading about all this, i learnt that 
arabic is written from right to left, except for numbers which are 
written from left to right). In those cases, the "bidi algorythm" (the 
thing in charge of displaying bi-directionnal text properly) will find 
itself in an unclear situation and can make the wrong choices. This 
results in things like (What you will see below here seems to depend on 
what you use. I can tell you it's not displaying well in thunderbird 
17.0.4 on Ubuntu) :

تعلم HTML4 فى 24 درسا

The words are the right ones, but their order is messed up. To an arabic 
reading person what we have here doesn't make sense, it's like "in 24 
lessons HTML4 Learn".

What happens here is that we have 3 directional runs (not 5 : فى 24 درسا 
is just one run from left to right, with the numbers being read from 
left to right as they should, but it's still the same run. Yes my mind 
is crying too ;) )

تعلم
HTML4
فى 24 درسا

This is the order in which they display in my thunderbird, and they 
should display in the reverse order. (I have saved a record with this 
title in Koha and the display *is* messed up in a ltr interface, it's 
fine if the interface is in arabic.) It seems thunderbird is messing 
things up because it is ordering them according to its context, which is 
left to right. But if you copy and paste the full line in a more simple 
text editor such as Gedit, you will get the right order, unless you 
start typing things in the latin alphabet at the start of the line 
(which will be on the right side). Then you will have a messed up order, 
aligned on the left.

Now, when we will migrate the data for this project i would like to take 
all measures to make sure things will be understandable in all possible 
contexts. That is whether the interface is displaying in a left to right 
or right to left language should not matter.

There are unicode invisible characters that can be used to say "this 
whole stretch is left to right" (or the opposite), or even some 
characters which cannot be seen but which are "strongly typed" rtl or 
ltr and can be inserted at the right place to clarify the context and 
fix things.

What i am tempted to do is to enclose all strings in those characters 
during the migration, according to the language used (an information i 
can find elsewhere in my data). Or just add the "clarifying" character 
at the end (or beginning ?) of the string. I guess it would be a good 
idea then to do this in all cases, that is not just for rtl languages in 
the data, but for both. (Even though chances are english books will have 
much less mixed scripts in their metadata than their arabic 
counterparts. This just seem like a justified equal treatment.) Is that 
the right way to proceed ?

One last thing is that when cataloguing new documents, i guess the 
librarians should pay attention to this. Should we then train them to 
use these characters at the appropriate places ?

If you have experience with this and can provide some advice as to the 
solutions usually put in place, i'd be extremely grateful if you could 
share it. I intend to get a good look at that and put some effort into 
fixing a few things on the way. ;)

Thanks,

If you read all this just out of interest, go to the W3C pages on bidi 
text, they are extremely informative and well written :
http://www.w3.org/International/tutorials/bidi-xhtml/

-- 
Gaetan Boisson
Chef de projet bibliothécaire
BibLibre
06 52 42 51 29
108 avenue Breteuil 13006 Marseille
gaetan.boisson at biblibre.com



More information about the Koha-devel mailing list