[Koha-patches] [PATCH] (bug #4856) fix rebuild zebra to delete NSB/NSE chars

Tue Jun 8 21:03:11 CEST 2010

> line 283
> It should remove the non sorting blocks (we read from docs.)

I didn't catch it. Thanks. My approach works for me: adding NSB/NSE as 
words separators in 'space' directive. Didn't it work for you? Did you 
test it?

> But that doesnot fit and when analysing the index process, it doesnot 
> remove them.

This directive may be wrong? => two closing parenthesis:

    map (\x88.*\x89))   @

Have you tried?

    map {\x88} @
    map {\x89} @

or:

    map <88> @
    map <89> @

> Solution could be ICU. But using ICU also means loosing some 
> truncation attributes like fuzzy, or left truncation. Moreover ICU is 
> quite picky about the way tokens are analyzed.

Zebra ICU should be explored further in order to alleviate week 
non-latin characters support in Koha.

> We also want to be able to propose some solution to those who are not 
> willing to use ICU and install yet another dependency.

Your solution for ICU-allergics is also for anybody. A lot of UNIMARC 
libraries want to keep NSB/NSE characters.

-- 
Frédéric DEMIANS
http://www.tamil.fr/u/fdemians.html