[Koha-patches] [PATCH] (bug #4856) fix rebuild zebra to delete NSB/NSE chars

Wed Jun 9 09:31:54 CEST 2010

Le 08/06/2010 21:03, Frederic Demians a écrit :
>
>> line 283
>> It should remove the non sorting blocks (we read from docs.)
>
> I didn't catch it. Thanks. My approach works for me: adding NSB/NSE as 
> words separators in 'space' directive. Didn't it work for you? Did you 
> test it?
>
>> But that doesnot fit and when analysing the index process, it doesnot 
>> remove them.
>
> This directive may be wrong? => two closing parenthesis:
>
>    map (\x88.*\x89))   @
>
We have, in some installs(without two parenthesis) :
map (\x88.*\x89)   @

> Have you tried?
>
>    map {\x88} @
>    map {\x89} @
>
> or:
>
>    map <88> @
>    map <89> @
>
If this work, this will do the same we do in rebuild zebra.


>> Solution could be ICU. But using ICU also means loosing some 
>> truncation attributes like fuzzy, or left truncation. Moreover ICU is 
>> quite picky about the way tokens are analyzed.
>
> Zebra ICU should be explored further in order to alleviate week 
> non-latin characters support in Koha.
>
>> We also want to be able to propose some solution to those who are not 
>> willing to use ICU and install yet another dependency.
>
> Your solution for ICU-allergics is also for anybody. A lot of UNIMARC 
> libraries want to keep NSB/NSE characters.
>
My conclusion : Zebra must be dropped in the trash, and be replaced by SolR

-- 
Nahuel ANGELINETTI