[Koha-devel] SimpleServer using Solr progress

Thomas Dukleth kohadevel at agogme.com
Wed Feb 16 11:54:03 CET 2011


[Original subject: Re: [Koha-devel] Solr and Z3950 server some news.]


1.  YAZ CONFIGURATION.

SimpleServer, like all IndexData products is dependent upon YAZ.

My thought on the bad MARC error from testing BibLibre work on
SimpleServer as a Z39.50/SRU server using Solr/Lucene is that the record
syntax/schema serialisation is not working correctly because no GFS
configuration file has been specified for YAZ.  A SimpleServer
implementation without a configuration file for YAZ would lack CQL to PQF
conversion, Explain support, etc.  The built in defaults seem to be
insufficient for proper record syntax/schema serialisation.

SimpleServer can be started with the -f option to specify a GFS
configuration file for YAZ,
http://www.indexdata.com/yaz/doc/server.vhosts.html .  YAZ retrieval
facility documentation,
http://www.indexdata.com/yaz/doc/tools.retrieval.html , is needed to help
understand serialisation for the GFS configuration file.  Z39.50 object
identifiers (OIDs) are listed at
http://www.loc.gov/z3950/agency/defns/oids.html .

A YAZ GFS configuration example is in etc/yazgfs.xml in the YAZ source
code.  Other needed configuration files such as pqf.properties,
cqlpass.properties, and maps.xml are linked from yazgfs.xml.  In Koha, the
YAZ GFS configuration file is etc/koha-conf.xml which links to other
files, such as etc/zebradb/explain-biblios.xml, and differently named
files, such as etc/zebradb/cql.properties.

The Koha examples have several mistakes and omissions including the
following.  Generic XML might include MARCXML and Dublin Core along with
any other XML schema  , therefore, generic XML should not be conflated
with MARCXML which have distinctive serialisations.  UNIMARC and USMARC
are distinctive and having UNIMARC use USMARC  defaults causes confusion
and may lead to bugs.  There are other mistakes and omissions for Koha YAZ
configuration but those seem most relevant to BibLibre's current work on
SimpleServer.

The ambiguity of the Z39.50 standard over whether records from the result
set would need to be retrieved again from the server for the present
command if they had already been retrieved as part of the response to the
search command complicates my understanding of what may be happening on
the server and client side when using the present command.  YAZ behaviour
is expected to parse MARCXML records in a distinctive MARC formatted
manner also used to parse ISO 2709 records when the present command is
issued.  In the current state of work on SimpleServer, with the apparent
absence of proper serialisation, the present command returns incompletely
parsed MARC records.  YAZ does not attempt parsing for generic XML. 
Saving MARCXML in raw format would avoid the MARC parsing from present.


2.  INVARIANT RESULT SET.

A more important problem remains that any SimpleServer query has been
returning exactly the same result set whether or not there would be any
legitimate matches.  1011 records had always been returned on when I
tested on Friday.  1027 records were always being returned when I tested
on Monday.


3.  DIRECTION FOR NOW.

I hope that my testing and direction to a possible solution has been
helpful to people working on SimpleServer using Solr/Lucene at BibLibre. 
As much fun as more actively helping to fix the problems would be, I have
to return to some non-library commitments presently.


Thomas Dukleth
Agogme
109 E 9th Street, 3D
New York, NY  10003
USA
http://www.agogme.com
+1 212-674-3783




More information about the Koha-devel mailing list