[Koha-devel] XML-UTF8-UNIMARC

Fri Apr 7 03:06:57 CEST 2006

Hi,
I have seen commits of retrieving a marc record converting it to xml and
back to marc to convert MARC8 to UTF8.
Well as far as I know Unimarc is not MARC8 so whether it will work with
that I'am not sure. Also what I am surprised is that we already have a
char_decode script in biblio.pm which is supposed to do this i.e.
Convert from UNIMARC and MARC (separately) to previously iso-8859 and
now it should to UTF8.
Being on a windows platform this actually works for me but I dont know
linux issues. So here is the solution(!) I came up with.
The records coming from breeding farm into addbiblio.pl are MARC8 and
with every communication set to UTF8 they become gibberish because both
are double byte characters and mysql assumes them to be UTF8 while they
are not. Of course not a problem with the ascii character world.
So when I am reading from the breeding farm I set the communications to
latin1 temporaryly and suprisingly enough mysql does not try to do any
funny conversion on double byte characters and just passes them through
as they are. Going through my char_decode they properly get converted to
UTF8. If am not mistaken this should work for UNIMARC as well. I do not
know the mechanics behind all this but a hint.

Just some food for thought for UTF8 gurus,
Cheers
Tumer