[Koha-devel] Stemming and zebra

Francois Charbonnier francois.charbonnier at inlibro.com
Wed Aug 27 14:26:11 CEST 2014


Hi,
Thanks Brooke for the support. I did read the rfc by the way but it's 
for queryparser which I'm not using now. I wish I'll find time to test it!
Thanks David, I haven't read the whole zebra documention (ahah, sounds 
crazy) but did some researches. I had the impression zebra didn't manage 
stemming but it wasn't clear. So thank you for your answer. It will 
definitely help me to understand!
Thanks Mathieu for the link. I'll have a look. For sure!
Have a good day ! :^)
François

François Charbonnier,
Bibl. prof. / Chef de produits

Tél.  : (888) 604-2627
francois.charbonnier at inLibro.com <mailto:francois.charbonnier at inLibro.com>

inLibro | pour esprit libre | www.inLibro.com <http://www.inLibro.com>
Le 2014-08-27 05:48, David Cook a écrit :
>
> Hi Mathieu:
>
> I think many of us think certain things happen in Zebra when they 
> actually happen in Koha before the query ever reaches Zebra ;).
>
> As for stemming, theoretically the language obtained via 
> “C4::Templates::getlanguage($cgi, 'intranet');” should filter down 
> into the Snowball stemming. If it isn’t working in French, it might be 
> because the right locale isn’t being passed to Snowball correctly. 
> That’s very possible as I think we’re using arbitrary language codes 
> rather than standard locales in some cases. It looks like there is a 
> fallback to English in C4::Templates::getlanguage() as well. If it’s 
> not working for French, it probably just needs a tweak!
>
> Yeah, I first heard about Snowball when reading through Zebra docs, 
> and I was pleasantly surprised when I saw that Lingua::Stem::Snowball 
> existed as a Perl interface for the C program.
>
> David Cook
>
> Systems Librarian
>
> Prosentient Systems
>
> 72/330 Wattle St, Ultimo, NSW 2007
>
> *From:*koha-devel-bounces at lists.koha-community.org 
> [mailto:koha-devel-bounces at lists.koha-community.org] *On Behalf Of 
> *Mathieu Saby
> *Sent:* Wednesday, 27 August 2014 7:30 PM
> *To:* koha-devel at lists.koha-community.org
> *Subject:* Re: [Koha-devel] Stemming and zebra
>
> Hi
>
> I had always thought stemming was made by Zebra, and only in english!
>
> In fact the algorithm for french language is here:
> http://snowball.tartarus.org/algorithms/french/stemmer.html
>
> (Lingua::Stem::Snowball is a Perl interface to the C version of the 
> Snowball stemmers)
>
>
> Mathieu Saby
>
>
> Le 27/08/2014 10:22, David Cook a écrit :
>
>     Hi Francois:
>
>     I wrote an email earlier on my tablet, but not 100% sure if it got
>     sent. In any case, I’m writing again now!
>
>     You’ll want to look at C4::Search::_build_stemmed_operand().
>
>     Zebra doesn’t actually do any stemming itself. If you read through
>     the Zebra docs (if you’re masochistic), you’ll notice that they
>     say explicitly that Zebra doesn’t do any stemming, but that you
>     can do stemming (using a stemmer like Snowball) while building a
>     query. That’s exactly what we do in Koha.
>
>     The Perl module that does the stemming is Lingua::Stem::Snowball.
>
>     However, you might notice that your query’s operands aren’t always
>     stemmed properly. I haven’t looked in a while, but I think it’s
>     because we don’t build our queries very well at all (when not
>     using QueryParser).
>
>     If you want to understand why you’re getting “skills” and
>     “fishxsdfe” in your results, I would suggest running some tests (
>     using “Data::Dumper” and “warn” ) so that you can see your query
>     as it is built.
>
>     I have a lot of work I want to do on C4::Search::buildQuery, but
>     just don’t have the time :/.
>
>     Unfortunately, at the moment, there is no stemming when using the
>     QueryParser. However, fortunately, using Lingua::Stem::Snowball
>     with QueryParser would be really really easy. I think that I’ve
>     written a note on how to do that somewhere on Bugzilla or maybe on
>     Trello…
>
>     I hope that helps! Feel free to send me an email or shout at me on
>     IRC if you want any clarification. I know I probably didn’t make
>     it any clearer but hopefully this might help you on your path to
>     understanding.
>
>     David Cook
>
>     Systems Librarian
>
>     Prosentient Systems
>
>     72/330 Wattle St, Ultimo, NSW 2007
>
>     *From:*koha-devel-bounces at lists.koha-community.org
>     <mailto:koha-devel-bounces at lists.koha-community.org>
>     [mailto:koha-devel-bounces at lists.koha-community.org] *On Behalf Of
>     *Francois Charbonnier
>     *Sent:* Wednesday, 27 August 2014 2:09 AM
>     *To:* koha-devel at lists.koha-community.org
>     <mailto:koha-devel at lists.koha-community.org>
>     *Subject:* [Koha-devel] Stemming and zebra
>
>     Hello,
>
>     I have tested the QueryStemming system preference on Koha 3.14 (my
>     local installation) and I'm wondering, does zebra just right
>     truncate the words or is there an algorithm to find the stems?
>
>     I use ICU and I have enabled "QueryWeightFields". I don't have
>     automatic truncation or fuzzy search on. I use these words for my
>     tests:
>
>     &#61623ski, skiing, skills
>
>     &#61623fish, fished, fishing, fisher, fishxsdfe
>
>     Each time, with QueryStemming on, skills and fishxsdfe come out in
>     the search results. Is it what I should expect? "Skills", maybe
>     but "fishxsdfe"?
>
>     Do you know how it works? or have a good example that would help
>     me to understand?
>
>     Thanks!
>
>     -- 
>
>     François Charbonnier,
>     Bibl. prof. / Chef de produits
>
>     Tél.  : (888) 604-2627
>     francois.charbonnier at inLibro.com
>     <mailto:francois.charbonnier at inLibro.com>
>
>     inLibro| pour esprit libre |www.inLibro.com <http://www.inLibro.com>
>
>
>
>
>     _______________________________________________
>
>     Koha-devel mailing list
>
>     Koha-devel at lists.koha-community.org  <mailto:Koha-devel at lists.koha-community.org>
>
>     http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
>
>     website :http://www.koha-community.org/
>
>     git :http://git.koha-community.org/
>
>     bugs :http://bugs.koha-community.org/
>
>
>
> _______________________________________________
> Koha-devel mailing list
> Koha-devel at lists.koha-community.org
> http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> website : http://www.koha-community.org/
> git : http://git.koha-community.org/
> bugs : http://bugs.koha-community.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20140827/bdc22f32/attachment-0001.html>


More information about the Koha-devel mailing list