[Koha-zebra] Koha Zebra Searching Report (from NPL)
Mike Taylor
mike at miketaylor.org.uk
Thu Mar 23 12:10:26 CET 2006
> Date: Wed, 22 Mar 2006 22:43:40 -0500
> From: Sebastian Hammer <quinn at indexdata.com>
>
>>> Can't do XOR today. I suppose it would be a possible new feature,
>>> but I've frankly never heard of it in an ILS.. can a XOR b be
>>> mapped to (a OR b) NOT (a AND b) ? or am I just showing my fading
>>> math skills to ill effect, here?
>>
>> Yep, that's the correct mapping. Voyager's where NPL originally
>> saw the XOR function.
>
> Ok. It can be faked in the front-end then, or implemented deeper in
> the guts of Zebra.
... but the real question is, can anyone thing of _any_ use-case for
this? I admit I've not tried very hard, but I can't imagine any
scenario in which I'd say, oh no, I don't want to see records that
have _both_ those terms!
>> I've looked high and low for documentation on the ranking
>> algorithms in Zebra but haven't found much more than a few
>> sentences in the official docs and some list messages ...
>
> It isn't documented beyond what's in the code, AFAIK.
Marc might have something.
> In fact, to index 245$a, you'd have to write something like
> xelm /*/datafield[@tag=245]/subfield[@code=a] title
> [...]
> however, none of these mechanisms allows you to construct phrase
> indexes that span multiple subfields.. and they don't allow you to
> do cool stuff like extract a date from the guts of 008...
Really? Surely it would be possible to write an XPath expression that
does this?
> Well, in Zebra 1.4, XSLT comes to the rescue, in a way that only
> XSLT can do it, with lots of angular brackets and much verbosity....
... which is _way_ easier to write than TCL :-) ...
> for instance, in an XSLT index filter,
>
> melm 245$a title:w
>
> becomes
>
> <xsl:template
> match="marc:record/marc:datafield[@tag='245']/marc:subfield[@code='a']">
> <z:index name="title"type="w">
> <xsl:value-of select="."/>
> </z:index>
> </xsl:template>
>
> Eek.
I think that "Eek" only scratches the surface, here. :-)
But: talk about power and generality!
_/|_ ___________________________________________________________________
/o ) \/ Mike Taylor <mike at miketaylor.org.uk> http://www.miketaylor.org.uk
)_v__/\ "Those who mourn for 'USENET like it was' should remember the
original design estimates of maximum traffic volume: two articles
per day" -- Steven Bellovin.
More information about the Koha-zebra
mailing list