[Koha-devel] 16.05, zebra and jessie

David Cook dcook at prosentient.com.au
Wed Aug 31 01:42:52 CEST 2016


I don't doubt that I've missed a ton of words! Between baby and competing projects, I haven't had as much time to keep up with the bleeding edge.

I want to be one of those people using and fixing it, although - as you say - everyone has their own priorities. 

It's the search engine that Koha deserves, but I reckon many of us are busy right now :/. 

Thanks for the update though, Chris. 

David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St
Ultimo, NSW 2007
Australia

Office: 02 9212 0899
Direct: 02 8005 0595


> -----Original Message-----
> From: Chris Cormack [mailto:chrisc at catalyst.net.nz]
> Sent: Wednesday, 31 August 2016 9:28 AM
> To: David Cook <dcook at prosentient.com.au>
> Cc: 'Tomas Cohen Arazi' <tomascohen at gmail.com>; 'Barton Chittenden'
> <barton at bywatersolutions.com>; 'Jonathan Druart'
> <jonathan.druart at bugs.koha-community.org>; koha-devel at lists.koha-
> community.org
> Subject: Re: [Koha-devel] 16.05, zebra and jessie
> 
> * David Cook (dcook at prosentient.com.au) wrote:
> >
> >
> >
> > I suppose Adam at IndexData has been busy with the FOLIO project, so I
> > doubt he has time to work on Zebra these days, even if we did have a
> patch available.
> >
> >
> >
> > Is ElasticSearch usable with Koha at this point? I heard a lot in
> > 2015, but after Robin left I haven’t heard a word other than rumours
> > that the patches had been pushed?
> 
> You missed tons of words then :)
> 
> Yes, it is in 16.05, marked expiremental, it works, mostly. But it will only get
> better with more people using and fixing it.
> The next task is to update the version it works with to Elastic 2. That isn't a
> huge amount of work, but everyone has their own priorities and a lot of us
> have to work on what users ask for (not what users need ;))
> 
> Chris
> 
> >
> > Subject: Re: [Koha-devel] 16.05, zebra and jessie
> >
> >
> >
> > I have seen use_zebra_facets=1 cause no facets rendered when GRS-1
> > configuration files are kept during upgrades up to where GRS-1 got
> > deprecated (3.20?). Is it the case? What does the About > System
> > information page says about your config?
> >
> >
> >
> > The slowliness is not in zebra per se, but in the way we retrieve the
> > facets from it (so Koha/Perl side). We retrieve each facet at a time
> > instead of fetching them all in one call. And they come in XML format,
> > so need to be parsed. So, if anyone is willing to improve it, just
> > need to optimize this function (read the TODO):
> >
> >
> >
> > sub _get_facet_from_result_set {
> >
> >
> >
> >     my $facet_idx = shift;
> >
> >     my $rs        = shift;
> >
> >     my $sep       = shift;
> >
> >
> >
> >     my $internal_sep  = '<*>';
> >
> >     my $facetMaxCount = C4::Context->preference('FacetMaxCount') //
> > 20;
> >
> >
> >
> >     return if ( ! defined $facet_idx || ! defined $rs );
> >
> >     # zebra's facet element, untokenized index
> >
> >     my $facet_element = 'zebra::facet::' . $facet_idx . ':0:' .
> > $facetMaxCount;
> >
> >     # configure zebra results for retrieving the desired facet
> >
> >     $rs->option( elementSetName => $facet_element );
> >
> >     # get the facet record from result set
> >
> >     my $facet = $rs->record( 0 )->raw;
> >
> >     # if the facet has no restuls...
> >
> >     return if !defined $facet;
> >
> >     # TODO: benchmark DOM vs. SAX performance
> >
> >     my $facet_dom = XML::LibXML->load_xml(
> >
> >       string => ($facet)
> >
> >     );
> >
> >     my @terms = $facet_dom->getElementsByTagName('term');
> >
> >     return if ! @terms;
> >
> >
> >
> >     my $facets = {};
> >
> >     foreach my $term ( @terms ) {
> >
> >         my $facet_value = $term->textContent;
> >
> >         $facet_value =~ s/\Q$internal_sep\E/$sep/ if defined $sep;
> >
> >         $facets->{ $facet_value } = $term->getAttribute( 'occur' );
> >
> >     }
> >
> >
> >
> >     return $facets;
> >
> > }
> >
> >
> >
> > Another option would be to make _get_facets_from_zebra build the
> > element set containing all facets so they are read in one call
> > (comma-separate all elements). The problem is that Zebra returns zero
> > if one of the elements is empty. Jared proposed to create a ghost
> > record with all facet fields. I didn't manage to make it work. Another
> > option is to patch Zebra. I started that, but abandoned once the ES code
> got pushed.
> >
> >
> >
> > So, if use_zebra_facets=0 is good enough, maybe it should be
> recommended.
> > Problem is it is not a real facet, but the sole extraction of the
> > fields from the first x records.
> >
> > As I said, it could be good enough anyway.
> >
> >
> >
> > Regards
> >
> >
> >
> >
> >
> > El mar., 23 ago. 2016 a las 10:21, Barton Chittenden (<
> > barton at bywatersolutions.com>) escribió:
> >
> >     Zebra tends to be I/O bound -- we've seen it write enormous .zrs files to
> >     disk (~16G/query on large libraries). Bug 13665 mentions that searches
> >     could be taking upwards of 40 seconds to complete -- I think that we've
> >     seen searches time out and return no results at about 1 minute.
> >
> >
> >
> >     Is it possible to tune Zebra's space/time optimizations in any way so that
> >     it doesn't write such large files to disk?
> >
> >
> >
> >     On Tue, Aug 23, 2016 at 5:38 AM, Jonathan Druart <
> >     jonathan.druart at bugs.koha-community.org> wrote:
> >
> >         See bug 13665 - Retrieve facets from zebra is slow
> >         To understand why and when use_zebra_facet=1 is slow
> >
> >
> >         2016-08-22 21:31 GMT+01:00 Barton Chittenden <
> >         barton at bywatersolutions.com>:
> >         > I haven't run into the issue with the dashes in idzebra-2.0 2.0.59,
> >         but I
> >         > have run into this, when using ICU-Chains:
> >         >
> >         > Bug 16581 : ICU tokenization bug in idzebra-2.0 2.0.59-1
> >         > URL       : https://bugs.koha-
> community.org/bugzilla3/show_bug.cgi?id
> >         =16581
> >         > Priority  : P5 - low
> >         > Urgency   : enhancement
> >         > Status    : NEW
> >         >
> >         > I also know that when use_zebra_facets was first introduced, it was
> >         *very*
> >         > slow -- I can't find any bugs about that though. It's possible that
> >         it got
> >         > so slow under idzebra-2.0 2.61 that the searches are timing out.
> >         >
> >         > It should be possible to set the logging for zebra so that you can
> >         see the
> >         > PQF queries:
> >         >
> >         > See
> >         >
> >         > Bug 15714 : Remove zebra.log from debian scripts and add optional
> log
> >         levels
> >         > URL       : https://bugs.koha-
> community.org/bugzilla3/show_bug.cgi?id
> >         =15714
> >         > Priority  : P5 - low
> >         > Urgency   : enhancement
> >         > Status    : RESOLVED
> >         >
> >         > For setting the log levels
> >         >
> >         > And http://koha.1045719.n5.nabble.com/
> >         Improving-Zebra-logging-td5861827.html
> >         >
> >         > For a general discussion of how to use them.
> >         >
> >         > ... This should give you some idea of what's failing, both in terms
> >         of the
> >         > dashes in 2.0.59 and the non-functional zebra facets under 2.0.61.
> >         >
> >         > My general feeling is that 2.0.59 is irredeemably broken by bug
> >         16581, and
> >         > we need at least 2.0.60, but I don't have any experience with zebra
> >         facets.
> >         >
> >         > --Barton
> >         >
> >         >
> >         >
> >         > On Mon, Aug 22, 2016 at 2:49 PM, Mark Tompsett
> <mtompset at hotmail.com>
> >         wrote:
> >         >>
> >         >> Greetings,
> >         >>
> >         >> Similar problem. I hope someone has a better solution than setting
> >         it to
> >         >> 0.
> >         >>
> >         >> GPML,
> >         >> Mark Tompsett
> >         >>
> >         >> -----Original Message-----
> >         >> From: Philippe Blouin
> >         >> Sent: Monday, August 22, 2016 2:40 PM
> >         >> To: koha-devel at lists.koha-community.org
> >         >> Subject: [Koha-devel] 16.05, zebra and jessie
> >         >>
> >         >> Hello!
> >         >>
> >         >> We're trying to find the correction combination.  We're new on
> >         Jessie,
> >         >> so we still have some tweaking needed...
> >         >>
> >         >> - By default, we get zebra 2.00.59 installed on Jessie through the
> >         >> packages.
> >         >> - On 16.05, we get some very bad results in the search when the
> >         itemtype
> >         >> contains an hyphen (-), like 'A-DOC'.
> >         >> - So we installed zebra 2.00.62.  This fixes the search...
> >         >> - But now we do not have facets.
> >         >> - So we set <use_zebra_facets>0</use_zebra_facets>
> >         >> - And now we have facets.  But this feels... wrong?
> >         >>
> >         >> My dummy question: what is the supposedly correct version of
> Zebra
> >         on
> >         >> Jessie ?
> >         >> And we're we correct in setting the config to 0 ?
> >         >>
> >         >> Thanks
> >         >> Blou
> >         >> _______________________________________________
> >         >> Koha-devel mailing list
> >         >> Koha-devel at lists.koha-community.org
> >         >> http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-
> devel
> >         >> website : http://www.koha-community.org/
> >         >> git : http://git.koha-community.org/
> >         >> bugs : http://bugs.koha-community.org/
> >         >>
> >         >> _______________________________________________
> >         >> Koha-devel mailing list
> >         >> Koha-devel at lists.koha-community.org
> >         >> http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-
> devel
> >         >> website : http://www.koha-community.org/
> >         >> git : http://git.koha-community.org/
> >         >> bugs : http://bugs.koha-community.org/
> >         >
> >         >
> >         >
> >         > _______________________________________________
> >         > Koha-devel mailing list
> >         > Koha-devel at lists.koha-community.org
> >         > http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> >         > website : http://www.koha-community.org/
> >         > git : http://git.koha-community.org/
> >         > bugs : http://bugs.koha-community.org/
> >
> >
> >
> >     _______________________________________________
> >     Koha-devel mailing list
> >     Koha-devel at lists.koha-community.org
> >     http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> >     website : http://www.koha-community.org/
> >     git : http://git.koha-community.org/
> >     bugs : http://bugs.koha-community.org/
> >
> > --
> >
> > Tomás Cohen Arazi
> >
> > Theke Solutions (https://theke.io)
> > ✆ +54 9351 3513384
> > GPG: B2F3C15F
> >
> 
> > _______________________________________________
> > Koha-devel mailing list
> > Koha-devel at lists.koha-community.org
> > http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> > website : http://www.koha-community.org/ git :
> > http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
> 
> 
> --
> Chris Cormack
> Catalyst IT Ltd.
> +64 4 803 2238
> PO Box 11-053, Manners St, Wellington 6142, New Zealand




More information about the Koha-devel mailing list