[Koha-devel] Koha 3.0, Zebra and UTF-8

Joshua M. Ferraro jmf at liblime.com
Thu Aug 23 15:57:53 CEST 2007


Hi Zeno,

See below ...

----- tajoli at cilea.it wrote:
> as i know Koha 3.0 it will base the opac on Zebra for the largest
> sites. But as I see here
> http://lists.indexdata.dk/pipermail/zebralist/2007-May/001522.html,
> Zebra has same limit on a full support of UTF-8.
> 
> Viewing the problem for Koha, where are the limits ?
> Can I input data in Latin, Arabic, Chinese, etc scripts and search
> them ?
> 
> With a mix of input scripts do you seggest to use Koha 3.0 without
> Zebra ?

There are plenty of examples of folks using Zebra to manage non-latin-1
languages - for instance, 

greek + english
russian + english
scandinavian languages + english
turkish + english

However, it is currently not possible to index more than two-three of
these simultanous in the same document corpus, as there is a hard
restriction on 256 indexable chars available.

The Index Data folks are in the process of integrating the ICU Unicode
libraries into Zebra, which will give Zebra the capability to index
the full UTF-8 character set in a single document corpus, with no
restriction on indexable characters.

The ICU UFT-8 integration work will provide character normalization and
tokenization over the full UTF-8 range of characters, but it may not
provide tokenization of languages like Japanese and Korean, that may
take a deep linguistic knowledge of the language and could be a lifetime
study in itself. That said, it should minimally provide support for
languages that use whitespace as the word separator.

Note that in Koha, we can do some stemming, synonym expansion, and
article removal/stopword creation pre-index and pre-search, for the
languages that aren't directly supported in Zebra.

Hope that answers your question without getting too technical ;-)

Cheers,

-- 
Joshua Ferraro                       SUPPORT FOR OPEN-SOURCE SOFTWARE
President, Technology       migration, training, maintenance, support
LibLime                                Featuring Koha Open-Source ILS
jmf at liblime.com |Full Demos at http://liblime.com/koha |1(888)KohaILS










More information about the Koha-devel mailing list