[Koha-zebra] Koha Zebra Searching Report (from NPL)

Mike Taylor mike at miketaylor.org.uk
Thu Mar 23 12:10:26 CET 2006


> Date: Wed, 22 Mar 2006 22:43:40 -0500
> From: Sebastian Hammer <quinn at indexdata.com>
> 
>>> Can't do XOR today. I suppose it would be a possible new feature,
>>> but I've frankly never heard of it in an ILS.. can a XOR b be
>>> mapped to (a OR b) NOT (a AND b) ?  or am I just showing my fading
>>> math skills to ill effect, here?
>>    
>> Yep, that's the correct mapping. Voyager's where NPL originally
>> saw the XOR function.
> 
> Ok. It can be faked in the front-end then, or implemented deeper in
> the guts of Zebra.

... but the real question is, can anyone thing of _any_ use-case for
this?  I admit I've not tried very hard, but I can't imagine any
scenario in which I'd say, oh no, I don't want to see records that
have _both_ those terms!

>> I've looked high and low for documentation on the ranking
>> algorithms in Zebra but haven't found much more than a few
>> sentences in the official docs and some list messages ...
> 
>  It isn't documented beyond what's in the code, AFAIK.

Marc might have something.

> In fact, to index 245$a, you'd have to write something like
>	xelm /*/datafield[@tag=245]/subfield[@code=a]     title
> [...]
> however, none of these mechanisms allows you to construct phrase
> indexes that span multiple subfields.. and they don't allow you to
> do cool stuff like extract a date from the guts of 008...

Really?  Surely it would be possible to write an XPath expression that
does this?

> Well, in Zebra 1.4, XSLT comes to the rescue, in a way that only
> XSLT can do it, with lots of angular brackets and much verbosity....

... which is _way_ easier to write than TCL :-) ...

> for instance, in an XSLT index filter,
> 
> melm 245$a title:w
> 
> becomes
> 
> <xsl:template 
> match="marc:record/marc:datafield[@tag='245']/marc:subfield[@code='a']">
>   <z:index name="title"type="w">
>     <xsl:value-of select="."/>
>   </z:index>
> </xsl:template>
> 
> Eek.

I think that "Eek" only scratches the surface, here.  :-)

But: talk about power and generality!

 _/|_	 ___________________________________________________________________
/o ) \/  Mike Taylor  <mike at miketaylor.org.uk>  http://www.miketaylor.org.uk
)_v__/\  "Those who mourn for 'USENET like it was' should remember the
	 original design estimates of maximum traffic volume: two articles
	 per day" -- Steven Bellovin.






More information about the Koha-zebra mailing list