[Koha-devel] Converting Koha sources to UTF-8

Thu Mar 25 03:36:14 CET 2010

I would like to convert the Koha source tree to UTF-8. Having everything
in a systematic, modern encoding would be beneficial for everyone, I
think.

I wrote a script to find files that are not in UTF-8 (attached for
review); it uses the isutf8 tool from Joey Hess's moreutils package (see
http://kitenet.net/~joey/code/moreutils/).

The script excludes a number of files based on the suffix, to avoid
confusing things by reporting binary files, etc.

It currently reports 59 files for me. Most are just copyright symbols or
names in release notes, and all of those seem to be in the ISO-8859-1
(Latin-1) character set, so converting them is easy. I did the actual
conversion with the "iconv -f ISO-8859-1 -t UTF-8" command.

The following three files puzzle me, however:

C4/tests/testrecords/marc21_marc8_combining_chars.dat
etc/zebradb/etc/urx.chr
etc/zebradb/lang_defs/en/sort-string-utf.chr

Is it acceptable to conver them to UTF-8, or should they remain as they
are? I don't know how they are used.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: find-nonutf8
Type: application/x-shellscript
Size: 1168 bytes
Desc: not available
URL: </pipermail/koha-devel/attachments/20100325/31396a12/attachment-0003.bin>