[Koha-devel] Record indexing and retrieval options

Thomas Dukleth kohadevel at agogme.com
Wed Jan 5 20:11:49 CET 2011


In the middle of last month, I started adding a very detailed report of my
investigations into various options for record indexing and retrieval in
Koha to the end of the Switch to Solr RFC in the wiki,
http://wiki.koha-community.org/wiki/Switch_to_Solr_RFC .  I moved the
report to its own page at
http://wiki.koha-community.org/wiki/Record_Indexing_and_Retrieval_Options_for_Koha
and added links from the RFC to content related most directly to the work
being done by BibLibre.

The report aims at an objective record of facts about the particular
options, except where some comparisons containing a point of view are
necessary and appropriate in the advantages and disadvantages section.  No
overall recommendation is contained in the report.  The purpose of the
report is to provide information to help each of us better inform our
judgements for each of our own conclusions.  No overall conclusions are
included in the report.  There are other places for arguing for particular
preferences overall including this mailing list.

I will offer particular recommendations based on my investigation in some
other messages.  Yet, aside from the great exception of not abstracting to
preserve existing record indexing and retrieval options, I find the choice
of software options which BibLibre are taking in their implementation of
Solr/Lucene to be reasonable.  There will be time in future to address
some Koha design deficiencies for record indexing which are unnecessarily
preserved in BibLibre's work.  I do not criticise BibLibre for where they
have not corrected some pre-existing Koha deficiencies.


1.  INVESTIGATIONS.

The report is the product of much investigation of source code; testing;
correspondence with Index Data and Knowledge Integration; and
communication with several people including those working on Solr/Lucene
record indexing at BibLibre.  BibLibre provided a test server for which I
was able to verify some Zebra bugs which had been reported by BibLibre
before finding some of the bugs in the Zebra bugs database.

Despite my efforts, the report may be especially prone to error or
omission in the large scope of the problem.  A few omissions where I
simply did not have sufficient time to describe some aspect of some
options are noted in the report or should otherwise be obvious. 
Corrections of errors and omissions would be greatly appreciated.

Very little in the report is the result of direct answers from Index Data
and Knowledge Integration.  However, both  Sebastian Hammer from Index
Data and Ian Ibbotson from Knowledge Integration were very helpful in
giving guidance about options to investigate.  I have special thanks to
give to Ian Ibbotson who was especially helpful when I followed some clues
which both he and Sebastian had left in messages referring to some
challenges of implementing Solr/Lucene.  Some of my follow up questions
have gone unanswered but few people are able to sustain giving sufficient
answers to questions even when paid to do so.


2.  SECTIONS.

2.1.  OPTIONS.

Sections of the report identify options for basic functional uses of
record indexing and retrieval.  Options are described for the possibility
of mixing and matching options together instead of an exhaustive list of
all possible combinations.


2.2.  ADVANTAGES AND DISADVANTAGES.

Each option has an advantages and disadvantages section.  In trying to
have a balanced presentation, some aspect of particular options listed in
the advantages subsection are also listed in the disadvantages subsection
with summary explanation about the disadvantageous part of that particular
aspect.


2.3.  CONFIGURATION.

A configuration section identifies the files and scripts used to configure
the various options which are most helpful in comparing options.  Links to
source code have been provided where possible.  Summary comparison has
been the goal not completeness but I welcome more complete improvements.

BibLibre work on configuration supporting Solr/Lucene is summarised with
links to source code.  Some links to BibLibre demonstrations of their
proof of concept and work in progress have not yet been included.  Please
help keep BibLibre configuration work in progress for supporting
Solr/Lucene updated.

2.4.  CODE FUNCTIONALITY.

A summary of code functionality is given for each option.  Links to source
code have been provided where possible.

This section is the one most likely to have mistakes which need correcting
especially for how various Index Data programs relate to YAZ in the
sequence of calls back and forth.  I used logic rather than reading the
source code in some cases.

BibLibre work on code functionality supporting Solr/Lucene is summarised
with links to source code.  Some links to BibLibre demonstrations of their
proof of concept and work in progress have not yet been included.  Please
help keep BibLibre code functionality work in progress for supporting
Solr/Lucene updated.


Thomas Dukleth
Agogme
109 E 9th Street, 3D
New York, NY  10003
USA
http://www.agogme.com
+1 212-674-3783




More information about the Koha-devel mailing list