[Koha-devel] Replace Catmandu indexing code with pure perl and eventually drop Catmandu as a Koha dependency

Fri May 25 04:30:17 CEST 2018

I assume that bookdrop machine is using SIP or NCIP for talking to Koha? How it connects to Koha is what I am trying to determine.

Ahhh, I thought you were talking about compiling the Catmandu modules themselves. Looking at your NYTProf results, I can see why you’re looking at Catmandu::Fix. 

I admit I hadn’t looked at the bug, as I was mostly just curious about the bookdrop machine and the compilation of the Catmandu code itself. 

I notice though that you’ve just posted profiles for doing a full reindex. It would be interesting to see what happens for the bookdrop. 

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St

Ultimo, NSW 2007

Australia

Office: 02 9212 0899

Direct: 02 8005 0595

From: David Gustafsson [mailto:glasklas at gmail.com] 
Sent: Monday, 21 May 2018 6:40 PM
To: David Cook <dcook at prosentient.com.au>
Cc: Koha-devel at lists.koha-community.org
Subject: Replace Catmandu indexing code with pure perl and eventually drop Catmandu as a Koha dependency

I'm regering to a book return / bookdrop machine machine, where books are returned when patrons put them in. With Catmandu there will be a significant delay between each return.

Compile step was perhaps the wrong term to use. I havn't dug that deep in what causes this, but I would guess that it's the Fix language parsing/conversion to perl code that has this overhead. Don't think plack would help. Would perhaps be possible to cache the resulting perl code if this is the culprit, but this would not improve the time of a full reindex significantly since biblios are indexed in batches, and the startup overhead will only occur once per batch. 

I actually used NYTProf when developing the patch and the benchmarks can be found as attachment s in the bugzilla issue.

David

måndag 21 maj 2018 skrev David Cook <dcook at prosentient.com.au <mailto:dcook at prosentient.com.au> >:

When you say “machine where you return books”, are you referring to a self-checkout machine or a computer where staff are checking in books manually?

When you say “Catmandu has some kind of compile step”, what do you mean by “compile”? If you’re using Plack, surely we should be pre-loading Catmandu and thus any compilation will already have happened? Admittedly I don’t use Plack with Koha, so I wouldn’t know if that’s how they’re doing it, but I use Plack with other systems and preload all the time-consuming modules to speed things up.  

If you want to see what’s the problem exactly, I’d suggest using https://wiki.koha-community.org/wiki/Profiling_with_Devel::NYTProf. That should show you where you are losing time. 

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St <https://maps.google.com/?q=72/330+Wattle+St+Ultimo,+NSW&entry=gmail&source=g> 

Ultimo, NSW <https://maps.google.com/?q=72/330+Wattle+St+Ultimo,+NSW&entry=gmail&source=g>  2007

Australia

Office: 02 9212 0899

Direct: 02 8005 0595

From: David Gustafsson [mailto:glasklas at gmail.com <mailto:glasklas at gmail.com> ] 
Sent: Friday, 18 May 2018 7:43 PM
To: David Cook <dcook at prosentient.com.au <mailto:dcook at prosentient.com.au> >
Cc: Koha-devel at lists.koha-community.org <mailto:Koha-devel at lists.koha-community.org> 
Subject: Re: [Koha-devel] Replace Catmandu indexing code with pure perl and eventually drop Catmandu as a Koha dependency

“the book drop machine” = machine where you return books. It does not matter if using Plack or not, Catmandu has some kind of compile step or similar that has a startup time of a couple of seconds. So everytime one returned a book (and the biblio was updated and indexed) there was a delay of a couple of seconds, if returning multiple books is a major issue.

Best Regards

David

2018-05-18 2:33 GMT+02:00 David Cook <dcook at prosentient.com.au <mailto:dcook at prosentient.com.au> >:

I don’t do anything with Elastic or Catmandu at the moment, so I won’t comment about that.

But you mention the overhead of Catmandu start-up. Can you speak more to that? What’s “the book drop machine”? Why isn’t Catmandu running in a persistent process?*

*I say as someone who still uses Koha using CGI rather than Plack…

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St <https://maps.google.com/?q=72/330+Wattle+St+Ultimo,+NSW&entry=gmail&source=g> 

Ultimo, NSW <https://maps.google.com/?q=72/330+Wattle+St+Ultimo,+NSW&entry=gmail&source=g>  2007

Australia

Office: 02 9212 0899

Direct: 02 8005 0595

From: koha-devel-bounces at lists.koha-community.org <mailto:koha-devel-bounces at lists.koha-community.org>  [mailto:koha-devel-bounces at lists.koha-community.org <mailto:koha-devel-bounces at lists.koha-community.org> ] On Behalf Of David Gustafsson
Sent: Thursday, 17 May 2018 11:57 PM
To: Koha-devel at lists.koha-community.org <mailto:Koha-devel at lists.koha-community.org> 
Subject: [Koha-devel] Replace Catmandu indexing code with pure perl and eventually drop Catmandu as a Koha dependency

Hi all!

I have been working on replacing Catmandu depandant indexing code with a simpler and faster Koha-specific one using the Search::Elasticsearch package (which Catmandu uses internally): https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19893

Some of the benefits would be:

1) Increased indexing performance (about twice as fast, six times as fast if comparing time spent in update_index()), due to more efficient json-conversion and fewer Elasticsearch requests.

2) With Catmandu indexing speed decreases as more mappings are added, with the alternative algorithm indexing is kept more or less constant no matter how many mappings you add.

3) Neglectable indexing start-up time. Especially noticeable when indexing a single document. For example we have an issue with the book drop machine, each return taking a couple of seconds because of the Catmandu start-up overhead (or when saving biblios in staff client).

4) More transparent code and less complexity compared with Catmandu (admittedly partly subjective statement) should lead to improved maintainability and increased stability.

5) No need for new developers to learn the Fix language

6) Closer to the metal so easier to perform even more Koha-specific optimizations and customizations which might not be feasible with Catmandu in tthe way

The proposed patch only addresses the indexing logic but the remaining Catmandu-dependant code (mainly for searching) should be pretty trivial to replace with Search::Elasticsearch implementation which can be done as a next step.

Would be wonderful if this could be raised for discussion at the next developers meeting.

Best regards

David Gustafsson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20180525/85dba46f/attachment-0001.html>