[Koha-devel] playing with the zebra on cvs head

Fri Aug 12 07:45:25 CEST 2005

(cc to zebra-koha ml, to show indexdata guys where things are)

Hi koha fans,

i'll leave tomorrow morning for 2 weeks, but before leaving, i have a 
gift for you : zebra is working in CVS head.
I'll try to explain what I did & what does not work.

In all this document, I consider that you have a working version of koha 
(cvs head) in directory KOHADIR

1- zebra SETTINGS
===================
* download & compile zebra, following indexdata docs.
* zebraidx must be in /usr/local/bin/zebraidx (hardcoded value in 
C4/Biblio.pm, modify it if you need)
* /usr/local/share/idzebra/tab/ must contain default parameters for 
zebra (or change line 6 of zebra.cfg)

* cd KOHADIR
If you have an UNIMARC Koha :
ln -s misc/zebra/unimarc zebra

If you have a MARC21 Koha, you will have to :
* either wait for joshua commits with a working marc21 config
or
* create your own (&commit it to CVS pls)

the most important part is to enable ID indexing on biblionumber :

in zebra.cfg :
recordId: (bib1,Local-number)
storeKeys:1

in .abs file :
elm 090            Local-number            -
elm 090/?          Local-number            -
elm 090/?/9        Local-number            !:w

(090$9 being the field mapped to biblio.biblionumber in Koha)

what you MUST have at the end of this step is :
* a directory KOHADIR/zebra, that is a symbolic link to your true zebra 
config file.
* a directory KOHADIR/zebra/biblios, that will recieve biblios from koha
* an empty zebra database.

PLEASE don't create a true zebra directory & don't commit your symbolic 
link, you could delete my own unimarc setup !
Note that biblios directory must be writeable by apache user (don't 
panic about security issue, it will be useless once Perl/zoom is available)

2- Koha SETTINGS
================
* export KOHA_CONF & PERL5LIB as usual
* cd KOHADIR
* updater/updatedatabase => warning, may be long. Will do some changes 
in Koha DB (moving fields & rebuilding things)
* marc_subfield_table, marc_word, marc_biblio and marc_blob tables are 
now useless but not removed for instance. You can rename them to be sure 
they won't be used (& see an error message if they are used somewhere)
* misc/migration_tools/rebuild_zebra.pl => will create an entry in 
KOHADIR/zebra/biblios
* cd KOHADIR/zebra
* zebraidx update biblios => will add all biblios into zebra.
* rm -f biblios/* => will delete all biblios once they are in zebra.
* zebrasrv @:2100 (port 2100 is hardcoded in SearchMarc.pm for instance) 
must always be running, otherwise you won't be able to search anything !

3- Play with your wonderful koha-zebra copy
===========================================
* search an existing biblio. The search will work on title/author/isbn. 
Any other search will be done as keyword (anywhere)
* modify this existing biblio. It should be modified in zebra as well 
(search it again, with the modified title, for example, should work) 
(Biblio.pm creates an iso2709 file in KOHADIR/zebra/biblios, launch 
zebraidx with a system(), and unlink the iso2709 file once it's done).
* add a new biblio. It should appear in zebra.
* add/modify items, they will be available in zebra as well (check 
through yaz-client to be sure)

4- what's next :
================
[PP] means i'll take care of it, but if someone else want, he can, of 
course ;-)

1- [PP] first, it's 2 weeks off for me. will be back on aug,29. (no web 
access where I go)
2- [PP] add support for deletion
3- (PP] re-introduce itemtype, that is no more in search result list (bug)
4- re-introduce "search ordered by"
5- [PP] improving -a lot- zebra unimarc settings, that are really basic
6- [PP] replace system() and Net::z3950 by the great Perl/zoom that 
indexdata will provide us soon
7- [PP and katipo] modify Biblio.pm to handle MARC=OFF as well as MARC=ON
8- fix some subs that are no more working atm, like FindDuplicate, 
getMARCnotes, getMARCsubjects...
9- fix other bugs that i haven't seen.
10- continue normalizing API, by using biblionumber & biblioitemnumber 
everywhere & moving some subs that should be in another place (i already 
removed some in SearchMarc.pm / Search.pm and Biblio.pm)
11- deeply modify the search API (SearchMarc.pm/cataloguesearch)
12- extend search to external z3950 servers (& improve admin/z3950 to be 
able to define z3950 servers for quick cataloguing and/or search)
13- modify z3950 client to get rid of Net::z3950 package, that can be 
replaced by .
14- introduce ranking search

5- questions :
==============
for instance, item status are not stored in the marc record (being 
issued...) It means that when the user issues a catalogue query, if 
there is 20 resulting biblios, with 35 items, Koha has to check status 
of those 35 items.
We can :
* do nothing, the cpu needed is not so big.
* embeed the item status. It means everytime the status is modified 
-issue/return/transfert-, the record has to be updated (cpu consumming 
too ?)
* add a checkbox "show item status" on search screen (with orderby). 
disabled by default, would means "don't check status".

-- 
Paul POULAIN
Consultant indépendant en logiciels libres
responsable francophone de koha (SIGB libre http://www.koha-fr.org)