[Koha-devel] Sharing experience with utf-8

Tue Aug 23 16:26:09 CEST 2005

Here is what we had to do to use koha in utf-8 hoping that it helps in some
of your discussions:

1-       We are using koha since 2.2.0 now at 2.2.2b

2-       We use English for intranet and English-Turkish for opac

3-       The platform is Windows

4-       We changed the character set of the database to utf-8 with the
iso-xxxx data in it. No problem for Mysql as you are moving up the ladder.
No need to reload the data (10 min)

5-       Changed all the charset=iso-xxxx in the templates to read utf-8 and
saved the files as utf-8 (15 min.) in a simple text editor.

6-       Character decode in biblio for MARC21 is very ambigious for us
because it is not very clear which character encoding it is changing from.
All the marc records we bulkimport are MARC-8 , iso2709 or ANSEL or whatever
you want to call them. So we simply wrote a one to one character mapping of
MARC-8 to utf-8 for our Turkish accented characters. Here it is:

#Additional Turkish characters 

 s/(\xf0)s/ş/gm; 

       s/(\xf0)S/Ş/gm; 

                  s/(\xf0)c/ç/gm; 

         s/(\xf0)C/Ç/gm;

      s/\xe7\x49/İ/gm;

      s/(\xe6)G/Ğ/gm;

      s/(\xe6)g/ğ/gm;

      s/\xB8/ı/gm;

      s/\xB9/£/gm;

       s/(\xe8|\xc8)o/ö/gm ;

         s/(\xe8|\xc8)O/Ö/gm ;

         s/(\xe8|\xc8)u/ü/gm ;

         s/(\xe8|\xc8)U/Ü/gm ;

      s/\xc2\xb8/ı/gm;

All the character codes are directly from LC's website about MARC21. Since
we provided the actual characters rather than their codes we saved the
biblio.pm as utf8 to save time. (Half a day together with research)

7-       We have a full working koha as utf8 supporting all characters and
we keep doing the same thing everytime we get an update.

8-       Translation of  opac files through .po files do not work for us. As
we see it, this po translator is simply a search and replace text engine. So
it converts the string ' English English <somevariable> English.' to '
Turkish Turkish <somevariable> Turkish'. Which is useless as it should be
'Turkish <somevariable> Turkish Turkish'.

9-       So we sat down and translated the opac templates to proper Turkish.
It is now more easier for our people to follow the changes in cvs and
implement the changes to templates rather than doing complete translations
every time.

10-   The whole update up till now is taking less than half a day with one
person doing it.

11-   We as Windows people do not have much experience with this po editor.
But as far as I know it supports utf8 so whats the hassle about these
translations? As far as we understand it the official language of KOHA is
English and if someone is translating it to some other language it is their
responsibility to find the resources to translate it in time to be
implemented as an additional language. Even if this requires a complete
rewrite of some templates.

12-   Finally we believe that koha should start using utf8 ASAP before the
move to zebra to gain experience. If zebra is implemented with all this iso
stuff we will have more problems with each translation requiring a different
character set and sort order set and character mapping to set etc. etc.

Koha is more powerful with more features, stability and performance and I
believe people will be more happy to see improvement in these even if they
have to spend a little bit more resource on their own translations.

With no prejudice,

Tumer Garip

Near East Univ. Library Director

Cyprus

tgarip at neu.edu.tr

-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/koha-devel/attachments/20050823/49c43b58/attachment-0002.htm>