[Koha-devel] planning for search rewrite, need feedback
Pat Eyler
pate at eylerfamily.org
Thu Jan 15 11:05:02 CET 2004
Yesterday and again this morning, discussions on the #koha irc channel
turned to searching and a decision was made to develop a "google-like
query syntax" (GQS) for searches. In general, we want to develop a
much better search back-end that:
- can optimize queries
- is accessible to multiple front-ends (Z39.50 server, Catalog
Searches, Circ/Patron searches, reports, etc.)
- handles security better than the current model
- 'lowers the bar' for developers who want to work on localized
improvements for searches
- easily spans the range from very simple to very complex queries
We're looking at two passes. The first pass would parse incoming
search requests and compile GQS queries. This pass would be performed
by the SearchParser::Foo family of modules (e.g., SearchParser::Basic,
SearcParser::Z3950, etc.). The second pass would convert the GQS into
SQL, optimize the SQL query for the Koha DB structure, and handle the
returning of data to the application. We intend to create these new
modules following the Perl OO style to provide the easiest possible
access to developers.
I volunteered to look up the google search syntax and make a
first cut at how we might use it for our query syntax. Would someone
be willing to look at Z39.50 searches to see what additions they'
d need? (Joshua?) Are there other kinds of searches that will need
special terms or operators? If so, what are they?
We still need to deal with constraining searches/queries to acceptable
kinds of queries against specific tables and fields. We can work on
hammering that out after we agree on the syntax we need.
Here's my take on the syntax:
The basic unit of a search is a term. By default I think we want to
match terms within a field rather than exactly matching. Terms are
any of the following:
aString
term term = term & term = term AND term
term | term = term OR term
term ^ term = term XOR term
(term)
-term
# not including term
#term
# begins with term
term%
# ends with term
-#term
-term%
= term
> term
< term
>= term
<= term
We also want to deal with the special case of:
"foo * bar"
There are some special operators we want to deal with. Some of these
operators could be combined, any of the operators could be negated
(with the syntax '-[operator]'). They should apply to the entire search:
locally => this item exists in the local branch
available => the item is not lost, checked out, or on hold
onhold => the item is marked as reserved
lost => the item is marked as lost
circulating => the item can be checked out
There are also operators which only act on a term (which of these
should be the default, 'keyword: term'?):
title: term
exacttitle: term
subject: term
author: term
exactauthor: term
keyword: term
branch: term
marc-[recordnumber][-sub]: term
# these should allow some wildcarding, like:
# marc-6X0-a
# marc-58X-a
#
# nobody in the discussion was/is a MARC expert, does
# this syntax work?
Some examples:
title: ("looking glass") -(author: carrol%)
# any title containing "looking glass" where the author doesn't
# end with carrol
"animal husbandry" OR (raising AND (rabbits OR chickens))
"animal husbandry" | (raising (rabbits | chickens))
# these are equivalent
subject: knights AND marc-526-c >= 3 AND marc-526-c <= 4
# subject includes knights and AR reading level is between
# third and fourth grade
available locally subject: knights AND marc-526-c >= 3
AND marc-526-c <= 4
# as above, but local copies are available for checkout
Pat Eyler
Kaitiaki/manager migrant Linux sys admin
the Koha project ruby, shell, and perl geek
http://www.koha.org http://pate.eylerfamily.org
More information about the Koha-devel
mailing list