[Koha-devel] The many failings of background_jobs_worker.pl
Philippe Blouin
philippe.blouin at inlibro.com
Wed Dec 21 15:04:16 CET 2022
Although I precise,
Cannot connect to broker Failed to connect: Error connecting to localhost:61613: Connection refused at /usr/share/perl5/Net/Stomp.pm line 27.; giving up at /usr/share/perl5/Net/Stomp.pm line 27.
So shutting down MQ has its own issues....
Philippe Blouin,
Directeur de la technologie
Tél. : (833) 465-4276, poste 230
philippe.blouin at inLibro.com
inLibro | pour esprit libre | www.inLibro.com <http://www.inLibro.com>
On 2022-12-21 08:58, Philippe Blouin wrote:
>
> Good evening, David,
>
> Thanks for the response. Yours and David's and Michael's. I feel
> less alone...
>
> I validated, and yes all the patches you refer are in our pile. And
> until the problems arose, there were no customizations around that code.
>
> So yeah, even at 22.05.06, I get the JSON error and the race condition
> (we use ES). And the _abandonned_ children. So I surmise, or dare I
> say postulate, that those issues are not as resolved as some would
> presume.
>
> I will revert background_jobs_worker.pl to its default, and shutdown
> MQ everywhere, for now. :(
>
> Philippe Blouin,
> Directeur de la technologie
>
> Tél. : (833) 465-4276, poste 230
> philippe.blouin at inLibro.com
>
> inLibro | pour esprit libre | www.inLibro.com <http://www.inLibro.com>
> On 2022-12-20 17:55, David Cook wrote:
>>
>> Salut Philippe,
>>
>> That first issue should’ve been resolved in 22.05.00 by
>> https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=30172. I
>> haven’t had any problems like that since applying that patch. Are you
>> running Koha with or without customizations?
>>
>> As you say, bug 30654 discusses that second issue. And I obviously
>> have my own opinion on that one 😉.
>>
>> That JSON issue should be fixed by Bug 31351 in Koha 22.05.06 as well
>> I believe:
>> https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=31351
>>
>> --
>>
>> The only issue I’ve had with the background jobs has been the one
>> covered by Bug 30172. Otherwise, it’s been all fine for me, although
>> I use Zebra rather than Elasticsearch. I think part of the reason I
>> haven’t had issues is that I haven’t had many people using the
>> background jobs either though.
>>
>> I’m actually planning on writing a background job system based on
>> RabbitMQ for a different non-Koha system. The main difference is that
>> I’ll reject or fail tasks where messages aren’t sent to RabbitMQ. I
>> think that’ll make my system a bit more robust than Koha’s.
>>
>> The problem with the background jobs at the moment is that we haven’t
>> fully committed to RabbitMQ. We’re trying to do this weird hybrid
>> with the database fallback which is not the right direction in my
>> mind. We should do one or the other but not try to do both.
>>
>> But that’s just my 2 cents.
>>
>> David Cook
>>
>> Senior Software Engineer
>>
>> Prosentient Systems
>>
>> Suite 7.03
>>
>> 6a Glen St
>>
>> Milsons Point NSW 2061
>>
>> Australia
>>
>> Office: 02 9212 0899
>>
>> Online: 02 8005 0595
>>
>> *From:*Koha-devel <koha-devel-bounces at lists.koha-community.org> *On
>> Behalf Of *Philippe Blouin
>> *Sent:* Wednesday, 21 December 2022 6:13 AM
>> *To:* koha-devel at lists.koha-community.org
>> *Subject:* [Koha-devel] The many failings of background_jobs_worker.pl
>>
>> Howdy!
>>
>> Since moving a lot of our users to 22.05.06, we've installed the
>> worker everywhere. But the number of issues encountered is staggering.
>>
>> The first one was
>>
>> Can't call method "process" on an undefined value
>>
>> where the id received from MQ was not found in the DB, and the
>> process is going straight to process_job and failing. Absolutely no
>> idea how that occurs, seems completely counterintuitive (the ID comes
>> from the DB after all), but here it is. Hacked the code to add a
>> "sleep 1" to fix most of that one.
>>
>> Then came the fact that stored events were not checked if the
>> connection to MQ was successful at startup. Bug 30654 refers it.
>> Hacked a little "$init" in there to clear that up at startup.
>>
>> Then came the
>>
>> malformed UTF-8 character in JSON string, at character offset 296
>> (before "\x{e9}serv\x{e9} au ...")
>>
>> at decode_json that crashes the whole process. And for some reason,
>> it never gets over it, gets the same problem at every restart, like
>> the event is never "eaten" from the queue. Hacked an eval then a
>> try-catch over it...
>>
>> After coding a monitor to alert when a background_jobs has been "new"
>> over 5 minutes in the DB, I was inundated by messages. There's alway
>> one elasticsearch_update that escapes among the flurry, and they
>> slowly add up.
>>
>> At this point, the only viable solution is to run the workers but
>> disable RabbitMQ everywhere. Are we really the only ones
>> experiencing that?
>>
>> Regards,
>>
>> PS Our servers are well-above-average Debian 11 machines with lot of
>> firepower (ram, cpu, i/o...).
>>
>> --
>>
>> Philippe Blouin,
>> Directeur de la technologie
>>
>> Tél. : (833) 465-4276, poste 230
>> philippe.blouin at inLibro.com <mailto:philippe.blouin at inLibro.com>
>>
>> inLibro| pour esprit libre |www.inLibro.com <http://www.inLibro.com>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20221221/b11e15ce/attachment-0001.htm>
More information about the Koha-devel
mailing list