[Koha-devel] Search Engine Changes : let's get some solr

Mon Oct 11 16:49:44 CEST 2010

1.  REWRITING EASIER THAN BUG FIXING.

I agree that as with many things in Koha there would be more work to fix
Search.pm than rewrite it.  I think that the CCL implementation had been
partly a mistake in which Joshua Ferraro took a shortcut to reducing the
work to support Nelsonville Public Library on Zebra.  Joshua had
previously agreed to support PQF.  Some previous searching features were
unnecessarily broken by the implementation which BibLibre later had to fix
in subsequent versions of 3.0.X.

2.  INNOVATION TO PRESERVE.

Yet, there was one great achievement of Joshua's work.  Searches for a
title which is a stop word on many automation systems return that title
sorted to the top of the result set.  Searches for the title 'it' would
return "It" by Stephen King.  That book would be unfindable by the title
in some library automation systems.

We should not loose such innovations which set Koha apart from other
library automation systems.

3.  Z39.50/SRU IMPLEMENTATION.

Where I may disagree with Paul Poulain is that completely dropping support
for Zebra would be cheaper than retaining it as an adjunct to Solr/Lucene
based indexing.  There is a need for a sophisticated Z39.50/SRU servers
for sharing records with the rest of the library community, although,
perhaps most libraries now using Koha are unconcerned about having their
own Z39.50 server.

I hope that I would be mistaken but my investigation thus far leaves me to
doubt that JZKit is a sufficient replacement for Zebra without much more
work which would cost something significant.  See what I reported about
JZKit in section 3 of my first post in this thread,
http://lists.koha-community.org/pipermail/koha-devel/2010-October/034468.html
.  My information about JZKit leads me to believe that we should be
working with Index Data on using Zebra as a Z39.50 server with Solr/Lucene
as they are already working on Solr/Lucene support and certainly already
provide a sophisticated Z39.50 server.

Furthermore, if we remove CCL support from Search.pm we do not remove the
need for supporting Z39.50/SRU queries.  We still need to support
Z39.50/SRU queries for returning result sets from other remote systems in
some user queries for resources outside the local catalogue and in copy
cataloguing.

I will be more complete about the continued importance of Z39.50/SRU later.

4.  IMPLEMENTATION QUESTION.

Henri-Damien Laurent raised an issue of DOM based indexing in point 8 of
his post starting this thread.

Is there an objection to using XPath indexing in general or for use with
Solr/Lucene?  What is that objection if any?

5.  IMPLEMENTATION SUGGESTION.

I think that eventually we will need distinct record designs optimised
specifically for the distinct functions of editing record content,
indexing, and display.  For the purpose of indexing discussion and not to
create too many problems at once, we could confine consideration to record
design optimised for indexing.  The MARC record whether in MARC format or
MARCXML is poorly suited for any purpose other than as a somewhat
antiquated record exchange format which is accepted throughout the library
community.

Thomas Dukleth
Agogme
109 E 9th Street, 3D
New York, NY  10003
USA
http://www.agogme.com
+1 212-674-3783