[Koha-devel] Stemming and zebra

Mathieu Saby mathsabypro at gmail.com
Wed Aug 27 11:30:19 CEST 2014


Hi

I had always thought stemming was made by Zebra, and only in english!

In fact the algorithm for french language is here:
http://snowball.tartarus.org/algorithms/french/stemmer.html

(Lingua::Stem::Snowball is a Perl interface to the C version of the 
Snowball stemmers)


Mathieu Saby



Le 27/08/2014 10:22, David Cook a écrit :
>
> Hi Francois:
>
> I wrote an email earlier on my tablet, but not 100% sure if it got 
> sent. In any case, I'm writing again now!
>
> You'll want to look at C4::Search::_build_stemmed_operand().
>
> Zebra doesn't actually do any stemming itself. If you read through the 
> Zebra docs (if you're masochistic), you'll notice that they say 
> explicitly that Zebra doesn't do any stemming, but that you can do 
> stemming (using a stemmer like Snowball) while building a query. 
> That's exactly what we do in Koha.
>
> The Perl module that does the stemming is Lingua::Stem::Snowball.
>
> However, you might notice that your query's operands aren't always 
> stemmed properly. I haven't looked in a while, but I think it's 
> because we don't build our queries very well at all (when not using 
> QueryParser).
>
> If you want to understand why you're getting "skills" and "fishxsdfe" 
> in your results, I would suggest running some tests ( using 
> "Data::Dumper" and "warn" ) so that you can see your query as it is built.
>
> I have a lot of work I want to do on C4::Search::buildQuery, but just 
> don't have the time :/.
>
> Unfortunately, at the moment, there is no stemming when using the 
> QueryParser. However, fortunately, using Lingua::Stem::Snowball with 
> QueryParser would be really really easy. I think that I've written a 
> note on how to do that somewhere on Bugzilla or maybe on Trello...
>
> I hope that helps! Feel free to send me an email or shout at me on IRC 
> if you want any clarification. I know I probably didn't make it any 
> clearer but hopefully this might help you on your path to understanding.
>
> David Cook
>
> Systems Librarian
>
> Prosentient Systems
>
> 72/330 Wattle St, Ultimo, NSW 2007
>
> *From:*koha-devel-bounces at lists.koha-community.org 
> [mailto:koha-devel-bounces at lists.koha-community.org] *On Behalf Of 
> *Francois Charbonnier
> *Sent:* Wednesday, 27 August 2014 2:09 AM
> *To:* koha-devel at lists.koha-community.org
> *Subject:* [Koha-devel] Stemming and zebra
>
> Hello,
>
> I have tested the QueryStemming system preference on Koha 3.14 (my 
> local installation) and I'm wondering, does zebra just right truncate 
> the words or is there an algorithm to find the stems?
>
> I use ICU and I have enabled "QueryWeightFields". I don't have 
> automatic truncation or fuzzy search on. I use these words for my tests:
>
>   * ski, skiing, skills
>   * fish, fished, fishing, fisher, fishxsdfe
>
> Each time, with QueryStemming on, skills and fishxsdfe come out in the 
> search results. Is it what I should expect? "Skills", maybe but 
> "fishxsdfe"?
>
> Do you know how it works? or have a good example that would help me to 
> understand?
>
> Thanks!
>
> -- 
>
> François Charbonnier,
> Bibl. prof. / Chef de produits
>
> Tél.  : (888) 604-2627
> francois.charbonnier at inLibro.com 
> <mailto:francois.charbonnier at inLibro.com>
>
> inLibro| pour esprit libre |www.inLibro.com <http://www.inLibro.com>
>
>
>
> _______________________________________________
> Koha-devel mailing list
> Koha-devel at lists.koha-community.org
> http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> website : http://www.koha-community.org/
> git : http://git.koha-community.org/
> bugs : http://bugs.koha-community.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20140827/9803a980/attachment-0001.html>


More information about the Koha-devel mailing list