[Koha-devel] Stemming and zebra
David Cook
dcook at prosentient.com.au
Wed Aug 27 11:48:09 CEST 2014
Hi Mathieu:
I think many of us think certain things happen in Zebra when they actually
happen in Koha before the query ever reaches Zebra ;).
As for stemming, theoretically the language obtained via
C4::Templates::getlanguage($cgi, 'intranet'); should filter down into the
Snowball stemming. If it isnt working in French, it might be because the
right locale isnt being passed to Snowball correctly. Thats very possible
as I think were using arbitrary language codes rather than standard locales
in some cases. It looks like there is a fallback to English in
C4::Templates::getlanguage() as well. If its not working for French, it
probably just needs a tweak!
Yeah, I first heard about Snowball when reading through Zebra docs, and I
was pleasantly surprised when I saw that Lingua::Stem::Snowball existed as a
Perl interface for the C program.
David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St, Ultimo, NSW 2007
From: koha-devel-bounces at lists.koha-community.org
[mailto:koha-devel-bounces at lists.koha-community.org] On Behalf Of Mathieu
Saby
Sent: Wednesday, 27 August 2014 7:30 PM
To: koha-devel at lists.koha-community.org
Subject: Re: [Koha-devel] Stemming and zebra
Hi
I had always thought stemming was made by Zebra, and only in english!
In fact the algorithm for french language is here:
http://snowball.tartarus.org/algorithms/french/stemmer.html
(Lingua::Stem::Snowball is a Perl interface to the C version of the Snowball
stemmers)
Mathieu Saby
Le 27/08/2014 10:22, David Cook a écrit :
Hi Francois:
I wrote an email earlier on my tablet, but not 100% sure if it got sent. In
any case, Im writing again now!
Youll want to look at C4::Search::_build_stemmed_operand().
Zebra doesnt actually do any stemming itself. If you read through the Zebra
docs (if youre masochistic), youll notice that they say explicitly that
Zebra doesnt do any stemming, but that you can do stemming (using a stemmer
like Snowball) while building a query. Thats exactly what we do in Koha.
The Perl module that does the stemming is Lingua::Stem::Snowball.
However, you might notice that your querys operands arent always stemmed
properly. I havent looked in a while, but I think its because we dont
build our queries very well at all (when not using QueryParser).
If you want to understand why youre getting skills and fishxsdfe in
your results, I would suggest running some tests ( using Data::Dumper and
warn ) so that you can see your query as it is built.
I have a lot of work I want to do on C4::Search::buildQuery, but just dont
have the time :/.
Unfortunately, at the moment, there is no stemming when using the
QueryParser. However, fortunately, using Lingua::Stem::Snowball with
QueryParser would be really really easy. I think that Ive written a note on
how to do that somewhere on Bugzilla or maybe on Trello
I hope that helps! Feel free to send me an email or shout at me on IRC if
you want any clarification. I know I probably didnt make it any clearer but
hopefully this might help you on your path to understanding.
David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St, Ultimo, NSW 2007
From: koha-devel-bounces at lists.koha-community.org
<mailto:koha-devel-bounces at lists.koha-community.org>
[mailto:koha-devel-bounces at lists.koha-community.org] On Behalf Of Francois
Charbonnier
Sent: Wednesday, 27 August 2014 2:09 AM
To: koha-devel at lists.koha-community.org
<mailto:koha-devel at lists.koha-community.org>
Subject: [Koha-devel] Stemming and zebra
Hello,
I have tested the QueryStemming system preference on Koha 3.14 (my local
installation) and I'm wondering, does zebra just right truncate the words or
is there an algorithm to find the stems?
I use ICU and I have enabled "QueryWeightFields". I don't have automatic
truncation or fuzzy search on. I use these words for my tests:
 ski, skiing, skills
 fish, fished, fishing, fisher, fishxsdfe
Each time, with QueryStemming on, skills and fishxsdfe come out in the
search results. Is it what I should expect? "Skills", maybe but "fishxsdfe"?
Do you know how it works? or have a good example that would help me to
understand?
Thanks!
--
François Charbonnier,
Bibl. prof. / Chef de produits
Tél. : (888) 604-2627
<mailto:francois.charbonnier at inLibro.com> francois.charbonnier at inLibro.com
inLibro | pour esprit libre | <http://www.inLibro.com> www.inLibro.com
_______________________________________________
Koha-devel mailing list
Koha-devel at lists.koha-community.org
<mailto:Koha-devel at lists.koha-community.org>
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20140827/b8503ee5/attachment-0001.html>
More information about the Koha-devel
mailing list