[Koha-zebra] Re: Unimarc, marc21, Unicode, and MARC::File::XML

Tümer Garip tgarip at neu.edu.tr
Tue Mar 21 22:38:20 CET 2006


I thought I explained it but here it is again:

I do not think which method you use is relevant here but but just try
this:

In the release version ZEBRA test/usmarc folder change the zebra.cfg to
read
recordType: grs.xml
in the tabs folder change marc21.abs to read record.abs 
Use zebraidx to create the database with the single XML record I sent to
you.
Start the zebrasrv at the required port.
Use yaz-client
f @attr 1=1016 book
format xml
show

I see the xml record header saying
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>

Further down you'll see utf-8 characters of correct hex as
\XC5\X9F

Now stop  the server.
Add line encoding:utf-8 to your zebra.cfg
Restart the server
Do the same search you get
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

Conclusion:
The database does keep the data in UTF-8 as expected.
Server does not know about database character set or the xml record taht
was parsed in and unless specificly set to UTF-8 in Zebra.cfg srever
goes ahead and changes the header or in fact it produces itself a header
saying iso-8859-1 while giving out utf-8 characters.

I did not ask any help on this thanks. Just clearing some issues with
Paul's problem.
Tumer
-----Original Message-----
From: Adam Dickmeiss [mailto:adam at indexdata.dk] 
Sent: Tuesday, March 21, 2006 9:58 PM
To: Tümer Garip
Cc: koha-zebra at nongnu.org
Subject: Re: [Koha-zebra] Re: Unimarc, marc21, Unicode, and
MARC::File::XML


Tümer Garip wrote:
> Hi Adam,
> You seem a bit offended that was not my intention, just frustation 
> sometimes makes me use harsh words and translanting them to english 
> may be too harsh.
> 
> I do not need to send you any config+examples cause I tested this with

> your default config files. I am attaching an xml record in utf-8
If you're to receive help from me you need to to tell me which zebra.cfg

you're using. And show me the record + the way you indexed it (zebraidx 
update ?)
> 
> Briefly I had default configuration files and build zebra with xml 
> records. When I noticed the problem I used yaz-client to see what was 
> going on. On my log I could see data going in the zebra was with 
> encoding utf-8 While yaz client was returning xml with headers saying 
> iso-8859-1 while I could actually see the utf-8 characters as they 
> show as hex in yaz client.
I also need to know what you see? And you you'd expect to see.

/ Adam

> I have retried this procedures just now and it seems the same. Just 
> adding encoding:UTF-8 to zebra.cfg and restarting the server you get 
> correct heading and correct data. Please note that server has to be 
> restarted but zebradb does not have to be rebuilt.
> 
> Thanks
> Tumer
> 
> -----Original Message-----
> From: Adam Dickmeiss [mailto:adam at indexdata.dk]
> Sent: Tuesday, March 21, 2006 9:00 PM
> To: Tümer Garip
> Cc: paul.poulain at free.fr; koha-zebra at nongnu.org
> Subject: Re: [Koha-zebra] Re: Unimarc, marc21, Unicode, and
> MARC::File::XML
> 
> 
> Tümer Garip wrote:
> 
>>Hi,
>>
>>This problem if I understood it correctly has got nothing to do with
>>mysql or perl it has to do with ZEBRA unless it is to do with UNIMARC 
>>which I am not familiar with. As you know (Paul) I have an utf-8 
>>version working.
>>
>>I had the same problem from records coming from zebra and found out
>>that it is not doing what it is supposed to do unless you explicitly 
>>set it to utf-8. You have to explicitly put "encoding utf-8" in all 
>>your zebra config files especially the zebra.cfg and your .abs . 
>>Otherwise unlike the documentation saying that zebra character code is
> 
> 
>>automatically set by the xml encoding it DOES NOT.
> 
> I can't reproduce this (bug). Care to share a a config+example that
> illustrates this (Inserts an XML record from Perl in UTF-8) ?
> 
> 
>>Perl send xml to zebra with encoding utf-8 on the header and utf-8
>>data in it. Zebra saves all the data in utf-8 but returns an xml 
>>saying encoding iso8859-1 at the header and utf-8 characters in data. 
>>No module can correct this as it is stupid.
> 
> Just need to know when the stupidity starts:-)
> 
> / Adam
> 
> 
>>I corrected the problem by adding encoding:UTF-8 in zebra.cfg,
>>record.abs, sort-string.chr
>>
>>Hope it solves yours,
>>
>>Tumer
>>
>>
>>
>>_______________________________________________
>>Koha-zebra mailing list
>>Koha-zebra at nongnu.org
>>http://lists.nongnu.org/mailman/listinfo/koha-zebra
>>
> 
> 
> 
> 
> _______________________________________________
> Koha-zebra mailing list
> Koha-zebra at nongnu.org 
> http://lists.nongnu.org/mailman/listinfo/koha-zebra
> 






More information about the Koha-zebra mailing list