[Koha-zebra] [Zebralist] mapping combined diacritics in *.chr files

Adam Dickmeiss adam at indexdata.dk
Thu Jan 15 16:29:18 CET 2009


Henri-Damien LAURENT wrote:
> Hi,
> é has 3 different equivalent forms in UTF-8 :
> - \xe9
> - \x0301+e
> - e+\x0301
I thought there'd be only two.. \x0301 is to follow the base (e). Your 
notation is UNICODE - not really UTF-8 bytes,... But let's not be picky :-)

> I would like to combine them all in a word chr file
> It seems that
> map (\xe9) e 
That's fine. But
> map (\x03\x01e) e
is incorrect. The UNICODE character 0x0301 is represented as these two 
bytes in UTF-8:  0xCC 0x81.

The notation, however, in .chr files is not UTF-8. It's UNICODE code 
ponits. And to specify anything but block 0.. \Lxxxx is to be used. Thus 
you'll have to use:

map (e\L0301)  e
> has not the expeceted effect.
> Is there something to do about that ?
Be sure that your records are really UNICODE inside Zebra.

/ Adam
>
> --
> Henri-Damien LAURENT
> BibLibre SARL
> http://www.biblibre.com
> Expert en Logiciels Libres pour l'info-doc
> tel : +33 4 67 65 75 50
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Zebralist mailing list
> Zebralist at lists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/zebralist
>   




More information about the Koha-zebra mailing list