[Koha-devel] The many failings of background_jobs_worker.pl

Philippe Blouin philippe.blouin at inlibro.com
Wed Dec 21 14:58:40 CET 2022


Good evening, David,

Thanks for the response.  Yours and David's and Michael's.  I feel less 
alone...

I validated, and yes all the patches you refer are in our pile. And 
until the problems arose, there were no customizations around that code.

So yeah, even at 22.05.06, I get the JSON error and the race condition 
(we use ES).  And the _abandonned_ children.  So I surmise, or dare I 
say postulate, that those issues are not as resolved as some would presume.

I will revert background_jobs_worker.pl to its default, and shutdown MQ 
everywhere, for now.  :(

Philippe Blouin,
Directeur de la technologie

Tél.  : (833) 465-4276, poste 230
philippe.blouin at inLibro.com

inLibro | pour esprit libre | www.inLibro.com <http://www.inLibro.com>
On 2022-12-20 17:55, David Cook wrote:
>
> Salut Philippe,
>
> That first issue should’ve been resolved in 22.05.00 by 
> https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=30172. I 
> haven’t had any problems like that since applying that patch. Are you 
> running Koha with or without customizations?
>
> As you say, bug 30654 discusses that second issue. And I obviously 
> have my own opinion on that one 😉.
>
> That JSON issue should be fixed by Bug 31351 in Koha 22.05.06 as well 
> I believe: https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=31351
>
> --
>
> The only issue I’ve had with the background jobs has been the one 
> covered by Bug 30172. Otherwise, it’s been all fine for me, although I 
> use Zebra rather than Elasticsearch. I think part of the reason I 
> haven’t had issues is that I haven’t had many people using the 
> background jobs either though.
>
> I’m actually planning on writing a background job system based on 
> RabbitMQ for a different non-Koha system. The main difference is that 
> I’ll reject or fail tasks where messages aren’t sent to RabbitMQ. I 
> think that’ll make my system a bit more robust than Koha’s.
>
> The problem with the background jobs at the moment is that we haven’t 
> fully committed to RabbitMQ. We’re trying to do this weird hybrid with 
> the database fallback which is not the right direction in my mind. We 
> should do one or the other but not try to do both.
>
> But that’s just my 2 cents.
>
> David Cook
>
> Senior Software Engineer
>
> Prosentient Systems
>
> Suite 7.03
>
> 6a Glen St
>
> Milsons Point NSW 2061
>
> Australia
>
> Office: 02 9212 0899
>
> Online: 02 8005 0595
>
> *From:*Koha-devel <koha-devel-bounces at lists.koha-community.org> *On 
> Behalf Of *Philippe Blouin
> *Sent:* Wednesday, 21 December 2022 6:13 AM
> *To:* koha-devel at lists.koha-community.org
> *Subject:* [Koha-devel] The many failings of background_jobs_worker.pl
>
> Howdy!
>
> Since moving a lot of our users to 22.05.06, we've installed the 
> worker everywhere.  But the number of issues encountered is staggering.
>
> The first one was
>
> Can't call method "process" on an undefined value
>
> where the id received from MQ was not found in the DB, and the process 
> is going straight to process_job and failing. Absolutely no idea how 
> that occurs, seems completely counterintuitive (the ID comes from the 
> DB after all), but here it is.  Hacked the code to add a "sleep 1" to 
> fix most of that one.
>
> Then came the fact that stored events were not checked if the 
> connection to MQ was successful at startup.  Bug 30654 refers it.  
> Hacked a little "$init" in there to clear that up at startup.
>
> Then came the
>
> malformed UTF-8 character in JSON string, at character offset 296 
> (before "\x{e9}serv\x{e9} au ...")
>
> at decode_json that crashes the whole process.  And for some reason, 
> it never gets over it, gets the same problem at every restart, like 
> the event is never "eaten" from the queue. Hacked an eval then a 
> try-catch over it...
>
> After coding a monitor to alert when a background_jobs has been "new" 
> over 5 minutes in the DB, I was inundated by messages.  There's alway 
> one elasticsearch_update that escapes among the flurry, and they 
> slowly add up.
>
> At this point, the only viable solution is to run the workers but 
> disable RabbitMQ everywhere.  Are we really the only ones experiencing 
> that?
>
> Regards,
>
> PS Our servers are well-above-average Debian 11 machines with lot of 
> firepower (ram, cpu, i/o...).
>
> -- 
>
> Philippe Blouin,
> Directeur de la technologie
>
> Tél.  : (833) 465-4276, poste 230
> philippe.blouin at inLibro.com <mailto:philippe.blouin at inLibro.com>
>
> inLibro| pour esprit libre |www.inLibro.com <http://www.inLibro.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20221221/8ba9c905/attachment.htm>


More information about the Koha-devel mailing list