[Koha-devel] The many failings of background_jobs_worker.pl

Philippe Blouin philippe.blouin at inlibro.com
Wed Dec 21 15:04:16 CET 2022


Although I precise,

Cannot connect to broker Failed to connect: Error connecting to localhost:61613: Connection refused at /usr/share/perl5/Net/Stomp.pm line 27.; giving up at /usr/share/perl5/Net/Stomp.pm line 27.

So shutting down MQ has its own issues....

Philippe Blouin,
Directeur de la technologie

Tél.  : (833) 465-4276, poste 230
philippe.blouin at inLibro.com

inLibro | pour esprit libre | www.inLibro.com <http://www.inLibro.com>
On 2022-12-21 08:58, Philippe Blouin wrote:
>
> Good evening, David,
>
> Thanks for the response.  Yours and David's and Michael's.  I feel 
> less alone...
>
> I validated, and yes all the patches you refer are in our pile.  And 
> until the problems arose, there were no customizations around that code.
>
> So yeah, even at 22.05.06, I get the JSON error and the race condition 
> (we use ES).  And the _abandonned_ children.  So I surmise, or dare I 
> say postulate, that those issues are not as resolved as some would 
> presume.
>
> I will revert background_jobs_worker.pl to its default, and shutdown 
> MQ everywhere, for now.  :(
>
> Philippe Blouin,
> Directeur de la technologie
>
> Tél.  : (833) 465-4276, poste 230
> philippe.blouin at inLibro.com
>
> inLibro | pour esprit libre | www.inLibro.com <http://www.inLibro.com>
> On 2022-12-20 17:55, David Cook wrote:
>>
>> Salut Philippe,
>>
>> That first issue should’ve been resolved in 22.05.00 by 
>> https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=30172. I 
>> haven’t had any problems like that since applying that patch. Are you 
>> running Koha with or without customizations?
>>
>> As you say, bug 30654 discusses that second issue. And I obviously 
>> have my own opinion on that one 😉.
>>
>> That JSON issue should be fixed by Bug 31351 in Koha 22.05.06 as well 
>> I believe: 
>> https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=31351
>>
>> --
>>
>> The only issue I’ve had with the background jobs has been the one 
>> covered by Bug 30172. Otherwise, it’s been all fine for me, although 
>> I use Zebra rather than Elasticsearch. I think part of the reason I 
>> haven’t had issues is that I haven’t had many people using the 
>> background jobs either though.
>>
>> I’m actually planning on writing a background job system based on 
>> RabbitMQ for a different non-Koha system. The main difference is that 
>> I’ll reject or fail tasks where messages aren’t sent to RabbitMQ. I 
>> think that’ll make my system a bit more robust than Koha’s.
>>
>> The problem with the background jobs at the moment is that we haven’t 
>> fully committed to RabbitMQ. We’re trying to do this weird hybrid 
>> with the database fallback which is not the right direction in my 
>> mind. We should do one or the other but not try to do both.
>>
>> But that’s just my 2 cents.
>>
>> David Cook
>>
>> Senior Software Engineer
>>
>> Prosentient Systems
>>
>> Suite 7.03
>>
>> 6a Glen St
>>
>> Milsons Point NSW 2061
>>
>> Australia
>>
>> Office: 02 9212 0899
>>
>> Online: 02 8005 0595
>>
>> *From:*Koha-devel <koha-devel-bounces at lists.koha-community.org> *On 
>> Behalf Of *Philippe Blouin
>> *Sent:* Wednesday, 21 December 2022 6:13 AM
>> *To:* koha-devel at lists.koha-community.org
>> *Subject:* [Koha-devel] The many failings of background_jobs_worker.pl
>>
>> Howdy!
>>
>> Since moving a lot of our users to 22.05.06, we've installed the 
>> worker everywhere.  But the number of issues encountered is staggering.
>>
>> The first one was
>>
>> Can't call method "process" on an undefined value
>>
>> where the id received from MQ was not found in the DB, and the 
>> process is going straight to process_job and failing. Absolutely no 
>> idea how that occurs, seems completely counterintuitive (the ID comes 
>> from the DB after all), but here it is.  Hacked the code to add a 
>> "sleep 1" to fix most of that one.
>>
>> Then came the fact that stored events were not checked if the 
>> connection to MQ was successful at startup.  Bug 30654 refers it.  
>> Hacked a little "$init" in there to clear that up at startup.
>>
>> Then came the
>>
>> malformed UTF-8 character in JSON string, at character offset 296 
>> (before "\x{e9}serv\x{e9} au ...")
>>
>> at decode_json that crashes the whole process.  And for some reason, 
>> it never gets over it, gets the same problem at every restart, like 
>> the event is never "eaten" from the queue.  Hacked an eval then a 
>> try-catch over it...
>>
>> After coding a monitor to alert when a background_jobs has been "new" 
>> over 5 minutes in the DB, I was inundated by messages.  There's alway 
>> one elasticsearch_update that escapes among the flurry, and they 
>> slowly add up.
>>
>> At this point, the only viable solution is to run the workers but 
>> disable RabbitMQ everywhere.  Are we really the only ones 
>> experiencing that?
>>
>> Regards,
>>
>> PS Our servers are well-above-average Debian 11 machines with lot of 
>> firepower (ram, cpu, i/o...).
>>
>> -- 
>>
>> Philippe Blouin,
>> Directeur de la technologie
>>
>> Tél.  : (833) 465-4276, poste 230
>> philippe.blouin at inLibro.com <mailto:philippe.blouin at inLibro.com>
>>
>> inLibro| pour esprit libre |www.inLibro.com <http://www.inLibro.com>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20221221/b11e15ce/attachment-0001.htm>


More information about the Koha-devel mailing list