[Koha-devel] Solr / zebra / search in Koha 3.10 => starting a workgroup

Wed Apr 4 09:23:37 CEST 2012

Hie,

I've worked a lot on Zebra configuration and behavior on two projets at
Progilone.
I'd like to continue working on it, as personnal contribution.

Note that the Lucene search engine used by Solr has a Perl implementation.
It is in Apache incubator, so still in havy development :
http://incubator.apache.org/projects/lucy.html

-- 
Fridolyn SOMERS
fridolyn.somers at gmail.com
Marsillargues - France
<fridolyn.somers at gmail.com>
On Fri, Mar 30, 2012 at 6:15 PM, Paul Poulain <paul.poulain at biblibre.com>wrote:

> Hello all,
>
> As you know, our main goal for the oct12 release of Koha is to introduce
> solr as an alternate search engine.
> BibLibre already explained which improvements will be added by this
> search engine on the blog page:
> http://drupal.biblibre.com/en/blog/entry/solr-developments-for-koha
>
> During the hackfest in Marseille, a group of 4 persons (Claire,
> Henri-Damien, Juan and Zeno) worked on how this work should be done to
> be introduced smoothly. The first goal being that a library wanting to
> run zebra still could. As some librarians could want to use another
> search engine than zebra or solr, we want to follow a path that would
> result in a better modularity.
> I also think that most of us agree that current search code is ugly &
> very hard to maintain/improve.
>
> The hackfesters have produced a drawing explaining how we could name the
> different packages:
>
> https://docs.google.com/a/biblibre.com/drawings/d/1ZdsQsoThYgIVSgH3LqgRZy17xm9X7XkLT6RG3fDYCzs/edit
> ,
> with a page on the wiki:
> http://wiki.koha-community.org/wiki/Switch_to_Solr_RFC#.23kohahack12
>
> In this drawing (read from bottom to top), there are 2 main layers
> "Search" and "Index", that are reponsible of doing searches and doing
> indexing. The "Conf" object will be responsible to retrieve the
> configuration (current getIndexes), the "Query" object would be
> responsible to build the query in SearchEngine grammar, the "Plugin"
> object would be reponsible to deal with records before indexing (like
> normalizing data)
>
> Claire (from BibLibre) made a first implementation of this organization
> on github:
> https://github.com/clrh/wip-searchengine-layer/tree/master/lib/SearchEngine
> .
> Juan (from xercode), also worked on this organization, on the zebra
> side. His code is available also on github:
> https://github.com/xercode/Data-SearchEngine-Zebra. Now, Henri-Damien is
> continuing the work for implementing zebra with this global structure.
>
> In the meantime, 2 other directions have been followed:
> * Frédéric (Demians, from Tamil) wrote a daemon for zebra indexing (see
> http://git.tamil.fr/?p=Koha-Contrib-Tamil;a=summary), that resulted in
> bug http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7759, that
> document how to introduce this daemon for indexing. Liz (and maybe
> others) are using it without any problem. This git repository introduces
> some other tools, but what they effectively do is not completely clear
> to me (Frédéric, if you want, to add some info...)
> * Galen (Charlton, from Equinox) wrote some code that you can see in
> http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7818 and is in
> "needs signoff" status. The description of the bug includes a lot of
> things: DOM indexing for biblios (and a tool to automatically write the
> DOM xsl from the record.abs), and a normalizer for datas, an indexer
> (Koha::Indexer). Unless I've missed something (Galen, tell if I'm
> wrong): for now, only the DOM indexing is submitted, normalizer and
> indexer are not.
>
> What we all agree about: we should have a clearer way to: Normalize /
> Index / Search in Koha. That's great !
>
> The structure described by the hackfester is great because it's
> independent from the SearchEngine you use.
> I think large portions (if not all) of Koha::Contrib::Tamil could be
> used to write the zebra indexing layer.
> I also think that The DOM indexing part of what Galen has submitted can
> be signed-off & pushed without any risk, but the normalize and indexer
> parts will need coordination to avoid having BibLibre/xercode working in
> a direction, and Galen working in another. I really like the idea of
> having normalizer not necessary being MARC; that could be useful in the
> future.
>
> That's why I propose to organize an IRC meeting (date and time to
> define, but that will be in Europe afternoon / US morning) with all
> volunteers to coordinate their efforts. I think this meeting should be
> regular (monthly ?)
> After each meeting, a summary of the conclusions would be made on the
> wiki and posted on this mailing-list.
>
> My proposition: if you're interested by participating to this effort,
> please answer to this mail. (I'll then start a doodle to find a proper
> time. I propose 2 hours for the duration of the 1st meeting, then,
> hopefully, shorter meetings) -Juan/Galen/Zeno, you're considered as
> being interested by this topic ;-)
> --
> Paul POULAIN
> http://www.biblibre.com
> Expert en Logiciels Libres pour l'info-doc
> Tel : (33) 4 91 81 35 08
> _______________________________________________
> Koha-devel mailing list
> Koha-devel at lists.koha-community.org
> http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> website : http://www.koha-community.org/
> git : http://git.koha-community.org/
> bugs : http://bugs.koha-community.org/
>
-------------- section suivante --------------
Une pi?ce jointe HTML a ?t? nettoy?e...
URL: </pipermail/koha-devel/attachments/20120404/53698322/attachment.htm>