[Koha-zebra] RE: [Koha-devel] Building zebradb

Thu Mar 16 15:33:31 CET 2006

Hi Paul,
The script you have requested I have posted them tou you directly as I
donno whether the list accepts attachments.

Regarding your questions below:
1- Well yes I am feeding zebra with 2 different kinds of records
(iso2079 and xml) and I do not see any problem with this as ZEBRA
changes everything to its own format anyway. This way I can create a
100K+ records in around 5 minutes in ZEBRA (this time excludes the time
it takes to export my whole database whisch is around 15 min). I can use
ZOOM with XML to update - delete or add new records.

2-Regarding whether we can use perl-ZOOM with iso2079 records it seems
at the moment NO!. And it is all our wish that indexdata does
incorporate this facility for us at some stage.

Best of luck,

Tumer

-----Original Message-----
From: Paul POULAIN [mailto:paul.poulain at free.fr] 
Sent: Wednesday, March 15, 2006 7:19 PM
To: Tümer Garip
Cc: koha-devel at nongnu.org; koha-zebra at nongnu.org
Subject: Re: [Koha-devel] Building zebradb

Tümer Garip a écrit :
> Hi,

Hello Tümer,

> We have now put the zebra into production level systems. So here is 
> some experience to share. Building the zebra database from single 
> records is a veeeeery looong process. (100K records 150k items)
> 
> Best method we found:
> 
> 1- Change zebra.cfg file to include
> 
> iso2079.recordType:grs.marcxml.collection
> recordType:grs.xml.collection
if I understand, you now have 2 types of records in your DB (or 2 
differents representations of a record)

> 2- Write (or hack export.pl) to export all the marc records as one big

> chunk to the correct directory with an extension .iso2079 And system 
> call "zebraidx -g iso2079 -d <dbnamehere> update records -n".

Could you send us the code for export.pl ?

> This ensures that zebra knows its reading marc records rather than xml

> and builds 100K+ records in zooming speed. Your zoom module always 
> uses the grs.xml filter while you can anytime update or reindex any 
> big chunk of the database as long as you have marc records.

Great, I think I understand.

> 3-We are still using the old API weso  read the xml and use 
> MARC::Record->new_from_xml( $xmldata ) A note here that we did not had

> to upgrade MARC::Record or MARC::Charset at all. Any marc created 
> within KOHA is UTF8 and any marc imported into KOHA (old 
> marc_subfield_tables) was correctly decoded to utf8 with char_decode 
> of biblio.

Could it be possible to use this zebra.cfg to manage iso2709 through 
Perl-ZOOM ?
If yes, we could avoid marc => xml => zoom and zoom => xml => marc 
transformations.

> 4- We modified circ2.pm and items table to have item onloan field and 
> mapped it to marc holdings data. Now our opac search do not call mysql

> but for the branchname.

Could you send us/me the code too ?

> 5- Average updates per day is about 2000 (circulation+cataloger). I 
> can say that the speed of the zoom search which slows down during a 
> commit operation is acceptable considering the speed gain we have on 
> the search.
> 
> 6- Zebra behaves very well with searches but is very tempremental with

> updates. A queue of updates sometimes crashes the zebraserver. When 
> the database crash we can not save anything even though we are using 
> shadow files. I'll be reporting on this issue once we can isolate the 
> problems.

You're definetly a gem too ;-)

-- 
Paul POULAIN et Henri Damien LAURENT
Consultants indépendants
en logiciels libres et bibliothéconomie (http://www.koha-fr.org)