[Koha-devel] proper sorting order and search for Swedish characters and ISBN search

Tajoli Zeno z.tajoli at cineca.it
Fri Nov 4 17:46:57 CET 2016


Hi Gaetan and all,

Il 04/11/2016 17:20, Gaetan Boisson ha scritto:
> The issue is: in Swedish ä, ö and å are separate letters. Not variants
> of a and o. This means that searching for å shouldn't bring up a. When
> sorting, they belong to the very end of the alphabet, after z, not along
> a and o.
>
> ICU has some kind of setting that allows to standardize isbns for
> searching, so you can search for any hyphenated variant, and still find
> the isbn that actually appears in your data. I *think* this works by
> removing all the hyphens in the index, but i am not quite sure.
>
> Anyway, in order to get the Swedish letters right, we had to do some
> specific chr configuration. Doing this forces us not to use ICU, which
> means giving up on standardized isbn search.

I think you speaking about general index.

zebradb/lang_defs/en/sort-string-utf.chr
7 lowercase {0-9}{a-y}üzæäøöå
8 uppercase {0-9}{A-Y}ÜZÆÄØÖÅ
[..]
16 # equivalent æä(ae)
17 # equivalent øö(oe)
18 # equivalent å(aa)
19 # equivalent uü

zebradb/etc/words-icu.xml
2   <transliterate rule="{ œ > oe "/>
3   <transliterate rule="{ Π> oe "/>
4   <transliterate rule="{ æ > ae "/>
5   <transliterate rule="{ Æ > ae "/>

zebradb/etc/phrases-icu.xml
2   <transliterate rule="{ œ > oe "/>
3   <transliterate rule="{ Π> oe "/>
4   <transliterate rule="{ æ > ae "/>
5   <transliterate rule="{ Æ > ae "/>

Is it not enough to change only those configurations to fix your problem ?
Which configuration do force you to CHR conf instead of ICU ?

Bye
Zeno Tajoli

-- 
Zeno Tajoli
/SVILUPPO PRODOTTI CINECA/ - Automazione Biblioteche
Email: z.tajoli at cineca.it Fax: 051/6132198
*CINECA* Consorzio Interuniversitario - Sede operativa di Segrate (MI)


More information about the Koha-devel mailing list