[Koha-devel] planning for search rewrite, need feedback

Pat Eyler pate at eylerfamily.org
Thu Jan 15 11:05:02 CET 2004


Yesterday and again this morning, discussions on the #koha irc channel
turned to searching and a decision was made to develop a "google-like
query syntax" (GQS) for searches.  In general, we want to develop a
much better search back-end that:
    - can  optimize queries
    - is accessible to multiple front-ends (Z39.50 server, Catalog
      Searches, Circ/Patron searches, reports, etc.)
    - handles security better than the current model
    - 'lowers the bar' for developers who want to work on localized
      improvements for searches
    - easily spans the range from very simple to very complex queries

We're looking at two passes.  The first pass would parse incoming
search requests and compile GQS queries.  This pass would be performed
by the SearchParser::Foo family of modules (e.g., SearchParser::Basic,
SearcParser::Z3950, etc.).  The second pass would convert the GQS into
SQL, optimize the SQL query for the Koha DB structure, and handle the
returning of data to the application.  We intend to create these new
modules following the Perl OO style to provide the easiest possible
access to developers.

I volunteered to look up the google search syntax and make a
first cut at how we might use it for our query syntax.  Would someone
be willing to look at Z39.50 searches to see what additions they'
d need?  (Joshua?)  Are there other kinds of searches that will need
special terms or operators?  If so, what are they?

We still need to deal with constraining searches/queries to acceptable
kinds of queries against specific tables and fields.  We can work on
hammering that out after we agree on the syntax we need.

Here's my take on the syntax:

The basic unit of a search is a term.  By default I think we want to
match terms within a field rather than exactly matching.  Terms are
any of the following:

   aString

   term term = term & term = term AND term

   term | term = term OR term

   term ^ term = term XOR term

   (term)

   -term
     # not including term

   #term
     # begins with term

   term%
     # ends with term

   -#term

   -term%

   = term

   > term

   < term

   >= term

   <= term


We also want to deal with the special case of:

   "foo * bar"


There are some special operators we want to deal with.  Some of these
operators could be combined, any of the operators could be negated
(with the syntax '-[operator]').  They should apply to the entire search:

    locally => this item exists in the local branch

    available => the item is not lost, checked out, or on hold

    onhold => the item is marked as reserved

    lost => the item is marked as lost

    circulating => the item can be checked out


There are also operators which only act on a term (which of these
should be the default, 'keyword: term'?):

    title: term

    exacttitle: term

    subject: term

    author: term

    exactauthor: term

    keyword: term

    branch: term

    marc-[recordnumber][-sub]: term
       # these should allow some wildcarding, like:
       #  marc-6X0-a
       #  marc-58X-a
       #
       # nobody in the discussion was/is a MARC expert, does
       # this syntax work?


Some examples:

title: ("looking glass") -(author: carrol%)
   # any title containing "looking glass" where the author doesn't
   # end with carrol

"animal husbandry" OR (raising AND (rabbits OR chickens))
"animal husbandry" | (raising (rabbits | chickens))
   # these are equivalent

subject: knights AND marc-526-c >= 3 AND marc-526-c <= 4
   # subject includes knights and AR reading level is between
   # third and fourth grade

available locally  subject: knights AND marc-526-c >= 3
AND marc-526-c <= 4
   # as above, but local copies are available for checkout



Pat Eyler
Kaitiaki/manager               migrant Linux sys admin
the Koha project               ruby, shell, and perl geek
http://www.koha.org            http://pate.eylerfamily.org





More information about the Koha-devel mailing list