[Koha-zebra] [Zebralist] mapping combined diacritics in *.chr files
Adam Dickmeiss
adam at indexdata.dk
Thu Jan 15 16:29:18 CET 2009
Henri-Damien LAURENT wrote:
> Hi,
> é has 3 different equivalent forms in UTF-8 :
> - \xe9
> - \x0301+e
> - e+\x0301
I thought there'd be only two.. \x0301 is to follow the base (e). Your
notation is UNICODE - not really UTF-8 bytes,... But let's not be picky :-)
> I would like to combine them all in a word chr file
> It seems that
> map (\xe9) e
That's fine. But
> map (\x03\x01e) e
is incorrect. The UNICODE character 0x0301 is represented as these two
bytes in UTF-8: 0xCC 0x81.
The notation, however, in .chr files is not UTF-8. It's UNICODE code
ponits. And to specify anything but block 0.. \Lxxxx is to be used. Thus
you'll have to use:
map (e\L0301) e
> has not the expeceted effect.
> Is there something to do about that ?
Be sure that your records are really UNICODE inside Zebra.
/ Adam
>
> --
> Henri-Damien LAURENT
> BibLibre SARL
> http://www.biblibre.com
> Expert en Logiciels Libres pour l'info-doc
> tel : +33 4 67 65 75 50
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Zebralist mailing list
> Zebralist at lists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/zebralist
>
More information about the Koha-zebra
mailing list