[Koha-devel] search speed

Thu May 28 15:54:51 CEST 2015

At 07:52 AM 5/28/2015 -0300, Tomas Cohen Arazi wrote:

>El 28/5/2015 4:43 a. m., "Fridolin SOMERS" <fridolin.somers at biblibre.com> 
>escribiÃ³:
> >
> > Could this mean one should not get records from Zebra but directly from 
> database ? If getting the id of search results without getting the full 
> record is possible.
>
>It is possible. But we should evaluate the trade-off from preparing the 
>record for display too. Because for indexing we do lots of stuff to the 
>record, stuff that should be done on rendering time with such a change.

Interesting. The rendering is only on 20 records, while the search is on 
tens or hundreds of thousands of records. It would obviously be a major 
exercise, but the trade-off would be enormous *if* it avoids the overload 
on the single core of the CPU (multi-threading across cores is, from what I 
was able to found out, just an impossible dream.)

Facets are a search enhancement with enormous potential, but when the time 
required increases from a few tenths of a second to 20 seconds+, we felt we 
could not put it into production -- our users tend to want "instant 
gratification."

Best -- Paul

> >
> > Le 27/05/2015 21:02, Paul A a Ã©crit :
> >>
> >> At 08:29 PM 5/27/2015 +0200, Gaetan Boisson wrote:
> >>>
> >>> Well as i said, the time is not the same depending on the number of
> >>> results, but in both cases, the number of results is anyway much
> >>> higher than the number of records taken into consideration for facets.
> >>>
> >>> Your investigation indicates that:
> >>>
> >>> In ZOOM->record, the time is spent in
> >>> Â  my $_rec = Net::Z3950::ZOOM::resultset_record($this->_rs(), $which);
> >>>
> >>> Maybe it's worth having a deeper look in this.
> >>
> >>
> >> I looked into this to some extent in January this year; facets in 3.18
> >> <http://navalmarinearchive.com/z_koha/search_speed_data.html> appeared
> >> to be a limiting factor as it "swamped" one CPU core (and NYProf showed
> >> this to be from ZOOM - see
> >> <http://navalmarinearchive.com/z_koha/nytprof_318_s/index.html>
> >>
> >> Regards -- Paul
> >>
> >>
> >>> Le 27/05/2015 12:56, Jonathan Druart a ÃƒÂ©crit :
> >>>>
> >>>> Gaetan,
> >>>> have a look at the metrics on bug 13665.
> >>>>
> >>>> 2015-05-27 11:45 GMT+01:00 Gaetan Boisson <gaetan.boisson at biblibre.com>:
> >>>>>
> >>>>> Â  Hello again all,
> >>>>>
> >>>>> looking at speed issues there is one thing that i don't understand,
> >>>>> and i
> >>>>> feel some well versed developpers might have a better idea of what is
> >>>>> happening.
> >>>>>
> >>>>> If you query zebra directly on the server, it's blazingly fast, no
> >>>>> matter
> >>>>> how big your database and how many results you get. (At least the
> >>>>> difference
> >>>>> is negligible, and we still get very low response times.)
> >>>>>
> >>>>> But if you do a search in Koha that brings 20 000 results, it's
> >>>>> usualy 4
> >>>>> times faster (maybe more, sorry i don't have precise metrics here)
> >>>>> than a
> >>>>> search that brings up 1 000 000 results.
> >>>>>
> >>>>> Is it consistent with your experience?
> >>>>>
> >>>>> If it is, what would be the reason for this? My understanding is
> >>>>> that the
> >>>>> search code asks zebra for the first n records to display on the
> >>>>> first page,
> >>>>> and that facets are based on a number of result way below 20 000
> >>>>> anyway, so
> >>>>> the total number of results shouldn't really make a difference.
> >>>>>
> >>>>> --
> >>>>> Gaetan Boisson
> >>>>> Chef de projet bibliothÃƒÂ©caire
> >>>>> BibLibre
> >>>>> 06 52 42 51 29
> >>>>> 108 avenue Breteuil 13006 Marseille
> >>>>> gaetan.boisson at biblibre.com