[Koha-zebra] Koha & zebra (continuing to try to understand...)

Mon Aug 8 15:51:13 CEST 2005

Hi,

I've just spoken a little on irc with joshua & investigate 
zoom.z3950.org & indexdata "embedding the zebra" doc & there are still 
dark points for me :

1st point :
============
It seems that "embeeding the zebra", page 6, 2.1.2.3 is wrong, as it 
says "The Perl API was submitted by Peter Popovics. It is supported & 
maintained by Index Data. The following is a synopsis of it's usage: ..."

Joshua told me this API was outdated, no more maintained. I tried "perl 
Makefile.PL && make && make test" on idzebra-1.3.28/perl but got a nasty :
Failed 8/8 test scripts, 0.00% okay. 103/103 subtests failed, 0.00% okay.

with the error :
Can't load '/XX/idzebra-1.3.28/perl/blib/arch/auto/IDZebra/IDZebra.so' 
for module IDZebra: 
/XX/idzebra-1.3.28/perl/blib/arch/auto/IDZebra/IDZebra.so: undefined 
symbol: XML_ExternalEntityParserCreate at 
/usr/lib/perl5/5.8.5/i386-linux-thread-multi/DynaLoader.pm line 230.
  at /XX/idzebra-1.3.28/perl/blib/lib/IDZebra.pm line 7

someone will confirm that it's a non sense to continue investigating 
what is in this directory.

2nd point :
===========
If I understand correctly (& I think I do), the yaz/zoom API gives 
facilities to search AND manage the zebra database.

It's confirmed by the mail from Sebastian (jul, 11) :
> o YAZ/ZOOM supports certain extensions to Z39.50 which allows a Perl application to directly manage databases in a Zebra installation,  update records, etc. I believe that this may be the neatest and most 
efficient way to integrate an application like Koha with Zebra.. 
superior to system() calling the Zebra command-line interface, and more 
robust than the native Zebra Perl API, which was developed by a third 
party and which may not be entirely reliable.

Once again, if I understand correctly, this Perl/API is not 100% ready & 
not released yet.

3rd point :
===========
(that troubles me a little bit...)
when I read http://zoom.z3950.org/api/zoom-1.4.html, I don't see 
anything to add or update or delete records.
am I becoming mad ? I don't think so. Someone will explain me the trick 
: is zoom-1.4.html an uncomplete doc or does yaz/zoom have an extension 
to the zoom official API or something else ?

anyway, now I consider we will soon have a Perl API to manage the zebra DB.
My plans to move from Koha 2.2 DB to Koha 3.0 DB (using zebra) has been 
explained a lot in a previous mail. It seems that the perl/zoom API is 
what I need & that it could be done quite quickly.
BUT we also need tools to move from 2.2 to 3.0. Hopefully, we have a 
great tool, called updatedatabase, to ... update the database (strange 
isn't it ;-) ?)
My detailled plan would be :

1- create the updatedatabase tool. I will have to :
- move frameworkcode from marc_biblio to biblio table
- store the MARC record in raw format in biblio.marc field
- delete the marc_biblio, marc_subfield_table & marc_word tables. once 
that will be done, MarcSearch would be useless, so I may delay a little 
the deletion of those tables, until search is at least in alpha stage.

2- create a "zebra re-index" tool, that will take the whole Koha MARC 
records an reindex them. Will be a necessity for developpers, until 
zebra config is 100% done, and even after, will be useful.

3- begin the work on Biblio.pm to use zebra , not marc_word / 
marc_subfield_structure.

4- check all Koha code to verify that marc_subfield_table & marc_word 
don't appear anywhere. Should not be too long :
[paul at bureau head]$ grep -c -i -R "marc_word" *|grep -v ":0" gives only 
a few places:
C4/Biblio.pm:12
C4/SearchMarc.pm:3
misc/migration_tools/build6xx.pl:1
misc/migration_tools/bulkmarcimport.pl:3
misc/build_marc_word.pl:7
misc/bulkauthimport.pl:1
misc/cleanmarcdb.pl:1
misc/koha.mysql:2
misc/koha2marc.pl:1
misc/rebuildnonmarc.pl:1
misc/spellcheck_suggest/make_spellcheck_suggest.pl:5
opac/opac-dictionary.pl:2
search.marc/dictionary.pl:2

At the end of this step, we have a fully functionnal Koha-zebra for data 
management.

5- partial search.
But we have no tools to search at all, that's a pity, I agree ;-)
That's why I think I also should rewrite an alpha SearchMarc.pm (the 
package that does searches in 2.2 branch) to be able to search on a 
single term, to have a cvs-head that at least works a little...
At this stage, I'll let the ball to someone else to do the new search 
API, tools & screens.

6- as I now have some experience with zebra, deals with the authority 
problem ;-)

The last question :
===================
Except for point 1, I need to work with zebra.
It seems I have some time to spend on this topic this week. Then 2 weeks 
off, then, in september probably only a few time free (2 migrations to 
end, 1 week in burkina faso at the end of the month, plus today still 
unknown things ;-) )

My idea, while waiting for Perl/Zoom package to be ready, would be to 
begin the work with some perl system call to run zebra & update the 
database.
Then, once the Perl package is ready, move to it.
I imagine this last step very easy, just replacing a file writing (the 
iso2709 record) & a system call (to run idzebra) by a "zebra connection 
open" and "deals the MARC record".

You, indexdata guys, do you think i'm right there or wrong ? should I go 
that way ?

thanks for your attention.
-- 
Paul POULAIN
Consultant indépendant en logiciels libres
responsable francophone de koha (SIGB libre http://www.koha-fr.org)