[Koha-zebra] RE: A few Zebra questions

Sat Apr 1 20:29:53 CEST 2006

Hi Joshua,
I am cc'ing this to the list as I think we should all discuss it.

>What I'm wondering is how you handle re-indexing after a crash. Are you
exporting what is found in the marc-xml
> field in Koha or are you preserving the marc* tables (ie, does Koha
still update all 
>of the marc* tables in your Koha?)? 

Well the crash problem is solved by Adam releasing version 1.3.35 as a
snapshot. But what I do is (incase something goes wrong) is to keep the
exported marc records. I a crash occurs you can not use the zebradb you
have to delete all the files of zebradb and start clean. I have written
another script that exports marc records not by biblionumber but by
their time-stamp. So I only export the modified records since my marc
record backups. It all depend how old they are. It is faster just to
export a few days records than the whole lot. But I think this issue is
no more an issue. I have the system online being updated 24hrs a day for
the last 5 days and all seems to have settled down.

I no more have marc_subfield_table or marc_biblio. I keep marcxml and
marc in bibilioitems. Although I am populating the marcxml field as well
currently I am using everthing from marc field. This is easier to
maintain as 2.2 requires marc rather than xml everywhere.

>Do you still use two different recordType entries in your zebra.cfg as
in the one you sent me?:

Yes I do, and unless ZOOM provides the facility of updating the Zebra
with marc(iso2079) I will keep on using it that way.

>If so, why is the second one 'grs.xml' instead of 'grs.marcxml'?

Well the for the grs.marcxml filter you have to feed marc(iso2079)
records. It does not accept anything else. I use this for initial
zebradb built. While you can only use grs.xml filter from within ZOOM
because thats the only filter to feed xml records (marcxml or whatever).

>Do you still used a modified version of MARC::File::XML? If so, for
what
> reasons (does the <collection> wrapper still prevent cause problems)?

If you use the <collection> wrapper than you can not call back a record
from the zebradb as marc(iso2079). The onlyway you can get it back is
xml and than convert it to marc(iso2079) which is irritating, time
consuming, slow etc. Since I need marc(iso2079) why get the file as xml
and convert it to marc while zebra can actually serve me marc records in
zooming speed?
I ahd problems compiling the new MARC::File::XML so I am using the 0.6
version which I modified so that it gives me <records> whith no
<collection> wrapper. I also do not have utf-8 problem you are having.
The marc record in biblioitems is UTF8 - when converted for zebra
produces utf8 xml and zebra serves me utf8 marc records.

>Finally, how are you handling record management using perl-zoom, have
you written your own
>methods or are you using HEAD code? (currently, HEAD code supports all
major functions).

Yes I have written -modified record handling and retrieval modules. I
have to get the system ready very soon to handle 1.5M records so some of
the stuff is hard coded and will be of no use to you. Also I am
modifying the mysql db as I go along for our needs so I did not want to
mess the origional code.To give you an example I had to add 4 new fields
to biblioitems to be able to sort the records according to LC. I will
send the codes to you or to Paul to commit as you move along to 3.0

You will find attached marc21_field_008.pl value builder files  because
I noticed that you intent to use 008 field to extract the
date-added-to-db. Well any records produced in KOHA does not have this
field, so I think we'll need it.

I intent to come to Paris for KohaCon so we may discuss these further
than,
Cheers
Tumer

-----Original Message-----
From: Joshua Ferraro [mailto:jmf at liblime.com] 
Sent: Saturday, April 01, 2006 4:57 PM
To: Tümer Garip
Subject: A few Zebra questions

Hi Tümer,

Thank you for your contributions thusfar, they have been of great help.
I've committed your missing090fields.pl script to CVS and used it myself
to repair NPL's data. I was also able to bulk index all of NPL's data
using the command-line as you outlined (and I've committed the config
files for this to CVS as well).

What I'm wondering is how you handle re-indexing after a crash. Are you
exporting what is found in the marc-xml field in Koha or are you
preserving the marc* tables (ie, does Koha still update all 
of the marc* tables in your Koha?)? 

Do you still use two different recordType entries in your zebra.cfg as
in the one you sent me?:

iso2709.recordType:grs.marcxml.record
recordType: grs.xml

If so, why is the second one 'grs.xml' instead of 'grs.marcxml'?

Do you still used a modified version of MARC::File::XML? If so, for what
reasons (does the <collection> wrapper still prevent cause problems)?

Finally, how are you handling record management using perl-zoom, have
you written your own methods or are you using HEAD code? (currently,
HEAD code supports all major functions).

Cheers,

-- 
Joshua Ferraro               VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE
President, Technology       migration, training, maintenance, support
LibLime                                Featuring Koha Open-Source ILS
jmf at liblime.com |Full Demos at http://liblime.com/koha |1(888)KohaILS
-------------- next part --------------
A non-text attachment was scrubbed...
Name: marc21_field_008.tmpl
Type: application/octet-stream
Size: 4806 bytes
Desc: not available
URL: </pipermail/koha-zebra/attachments/20060401/fadf117b/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: marc21_field_008.pl
Type: application/octet-stream
Size: 3093 bytes
Desc: not available
URL: </pipermail/koha-zebra/attachments/20060401/fadf117b/attachment-0003.obj>