[Koha-zebra] Koha Zebra Searching Report (from NPL)

Thu Mar 23 02:54:20 CET 2006

On Wed, Mar 22, 2006 at 08:28:26PM -0500, Sebastian Hammer wrote:
> Can't do XOR today. I suppose it would be a possible new feature, but 
> I've frankly never heard of it in an ILS.. can a XOR b be mapped to
> 
> (a OR b) NOT (a AND b) ?   or am I just showing my fading math skills to 
> ill effect, here?
Yep, that's the correct mapping. Voyager's where NPL originally
saw the XOR function.

> Why do you see yourelf limited to Bib-1? Within Koha, you can do 
> whatever you want -- specifically extend Bib-1 into the 8000-range 
> (IIRC) for local USE attributes or define a private set.
Right, I was just hoping there was some way to map it to bib-1 as
I assume that would be useful in cross-domain searching. If not we
can certainly do a locally defined attribute or set.

> >SPELLCHECKING
> It isn't soundex, but it will behave somewhat the same in many cases. 
> Try searching with truncation=Regexp-2  (103). This enables 
> error-tolerant searching. By default, one error (insert/delete/replace) 
> per term will still lead to a match. More at 
> http://www.indexdata.com/zebra/doc/protocol-support.tkl#search
Neat, we'll look into it.

> >TITLE SEARCHING
> >
> This would, I believe, require new development. It's possible that one 
> of the experimental ranking algorithms that are included might provide 
> better results for these people, but I *think* that boosting the score 
> for one field in a ranked keyword search would require an extension to 
> the index structure.
I've looked high and low for documentation on the ranking algorithms in
Zebra but haven't found much more than a few sentences in the official
docs and some list messages ... 

> >AUTHOR SEARCHING
> >
> >Again, the current relevance ranking doesn't quite cut it. A good
> >example is a relevance ranked author search on "James Joyce". Some
> >records sneak into high relevance because they have multiple authors
> >with names like "James Henry" and "Paul Joyce" (take  "Bob the Builder
> >in the NPL database as an example
> >
> It might be worth checking whether one of the custom ranking algos did 
> better on this..you an look in the NEWS file for instructions on how to 
> enable them.
Will do.

> >relevance ranking
> >should account for proximity and use that as the highest ranking
> >consideration to ensure that a search on "James Joyce" returns all the
> >books by "James Joyce" first. Also, they requested that the default
> >ranking secondarily sort the items by date as well because they often 
> >are asked to find the 'latest' book by so and so. We concluded that 
> >the copyright date stored in the 008 is probably the only date 
> >normalized enough to use for sorting though I'm not sure if zebra can 
> >use that for sorting.
> > 
> >
> It could with the XSLT index rules of Zebra 1.4.
Cool, and are there docs on that somewhere? :-)

> >SUBJECT SEARCHING
> >
> >They seemed pleased with the way subject searching was working, it
> >will correctly find things like "horses--psychology" where the first
> >term is in 650$a and the second in $x. However, it seems not to 
> >rank things based on proximity within a tag -- meaning that a search
> >on horses--psychology will pull up records containing:
> >
> >650$a horses
> >$x pets
> >
> >650$a humans
> >$x psychology
> >
> >and records with the actual 'horses--psychology' (650$a$x) subject
> >heading aren't given any favor in the ranking (I misplaced my actual 
> >example and the one above is one I invented).
> > 
> Same thing. I don't know how hard it would be to add a score for 
> proximity.. that data is at least in the index structure, but I've no 
> idea how hard it would be to fit into the code. We can ask the Zebra 
> wranglers what it would entail if you're interested.
Yes, please do, we're very interested in that particular one.

> >SUBJECT HEADING SEARCH
> >
> >NPL would like to see a demonstration of a 'Subject Heading' search
> >using authorities generated from the data to compile a list of
> >authoritative headings (which would be compiled from multiple fields
> >within a given subject tag such as $650$a$v$x, etc.). So I think 
> >to do this right we'd need to look at putting our authority records
> >in Zebra as well.
> >
> Hmm. Not sure I fully grok the requirement here.. you seem to suggest 
> both constructing a specific index key based on a concatenation of 
> multiple fields (easy in the XSLT indexing rules of 1.4, not compatible 
> with the 'melm' directive.
I'm unclear about the differences between 'elm' and 'melm'. The docs
seem to indicate that they are the same...

Thanks!

-- 
Joshua Ferraro               VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE
President, Technology       migration, training, maintenance, support
LibLime                                Featuring Koha Open-Source ILS
jmf at liblime.com |Full Demos at http://liblime.com/koha |1(888)KohaILS