[Koha-zebra] Koha & zebra (continuing to try to understand...)

Mon Aug 8 16:48:03 CEST 2005

At 16:34 08-08-2005 +0200, Paul POULAIN wrote:

>>Correct. The YAZ/ZOOM API (C language) supports these extensions, but 
>>they are not supported by Net::Z3950, which doesn't use YAZ/ZOOM. 
>>Net::Z3950 predates the ZOOM specification, and it is implemented on top 
>>of a lower-level API in the YAZ toolkit. The solution that we have 
>>discussed with Joshua involves building a new and better Perl API that 
>>more closely reflects the ZOOM standard, directly atop the YAZ/ZOOM API. 
>>It would thus benefit from a range of extended functionality in YAZ/ZOOM, 
>>such as normalizing MARC records to UTF-8/MARCXML, and the extended 
>>services for database updates.
>
>and moving to zebra will help getting rid with Net::z3950 & Event.pm that 
>is problematic.

Yes.

>>That makes sense. The Zebra manual has simple examples of how to create a 
>>simple database.
>
>Playing with zebra was task 0, that has already been done : I have a 
>UNIMARC setup at least partially functionnal. I will submit it if you are 
>interested once it will be 100% ok.

Ok. We should have a good chat about this sometime. Zebra actually has a 
number of different approaches to MARC indexing and storage.. I'd be 
interested in seeing your setup and commenting. I've done some work on 
alternative approaches to configuring MARC records lately to make the 
config files more readable.

The next version of Zebra may well include features to make the MARC setup 
a lot more flexible, by using XSLT to control the indexing... this would be 
intended to get around the limitation that we currently can't build a 
phrase index that spans multiple MARC subfields -- as required to support 
title, subject, and other searches.

>>I *think* I would recommend that you start with the CVS version of YAZ 
>>and Zebra rather than using the current releases.
>
>could you explain why ? (& how to get cvs  ?)

Instructions for using CVS here: http://www.indexdata.dk/software/. The 
packages needed are YAZ and Zebra. You'll need to run buildconf.sh to build 
the GNU autoconfig-stuff that's normally part of the distribution.

Here's the why: Over the past years-and-a-half, development on Zebra has 
really picked up fairly dramatically.. we've done a lot to optimize the 
performance of the indexing (such that adding a new record to a very large 
database now happens in nearly constant time, independently of the size of 
the database).. searching performance has also been boosted, and some new 
functionality concerning the indexing of XML and MARC records has been added.

The downside is that this development-effort has temporarily curtailed the 
normal release cycle, so no new releases have been made for more than a 
year. We have just decided that we want to get a new beta release out 
within a month or so and to pick up a more rapid release-cycle after that.. 
but right now, the public release is a bit of a dinosaur in terms of some 
stuff.

--Sebastian
--
Sebastian Hammer, Index Data, www.indexdata.com
Direct phone: (603) 209-6853 Fax: (603) 357-1813