[Koha-devel] The many failings of background_jobs_worker.pl
Philippe Blouin
philippe.blouin at inlibro.com
Wed Dec 21 21:48:47 CET 2022
Hello Fridolin,
Yes, performance is probably important for MQ, it seems to eat a
disproportionate amount of resources sometime. :)
But like I said, our server is a beast. In every aspect of its build,
it has the best components for 2022.
Let's see about the race condition:
* An update is done on a biblio
* An update_elastic_index job is created in the db.
* Its ID is pushed onto MQ
* background_jobs_worker.pl picks up the ID from MQ
o it goes to the DB, and finds nothing with that ID. We get a
pointer error (yeah, I come from C)
o This is NOT an old forgotten floating job, since we can see the
job in the database when looking manually.
o The jobs stays there forever, with status 'new'.
* If I add a "sleep 1", this issue _mostly_ disappear.
There's no server performance that could explain this. Maybe some DB
caching ?
Philippe Blouin,
Directeur de la technologie
Tél. : (833) 465-4276, poste 230
philippe.blouin at inLibro.com
inLibro | pour esprit libre | www.inLibro.com <http://www.inLibro.com>
On 2022-12-21 13:06, Fridolin SOMERS wrote:
> Hi,
>
> I think network performance is really important for RabbitMQ.
> We at Biblibre have a virtual machine in each server, to share between
> each virtual machine (one Koha per machine) but keep a good network
> performance.
> Looks to work well, but we are still in 21.11.
>
> Best regards,
>
> Le 20/12/2022 à 09:13, Philippe Blouin a écrit :
>> Howdy!
>>
>> Since moving a lot of our users to 22.05.06, we've installed the
>> worker everywhere. But the number of issues encountered is staggering.
>>
>> The first one was
>>
>> Can't call method "process" on an undefined value
>>
>> where the id received from MQ was not found in the DB, and the
>> process is going straight to process_job and failing. Absolutely no
>> idea how that occurs, seems completely counterintuitive (the ID comes
>> from the DB after all), but here it is. Hacked the code to add a
>> "sleep 1" to fix most of that one.
>>
>> Then came the fact that stored events were not checked if the
>> connection to MQ was successful at startup. Bug 30654 refers it.
>> Hacked a little "$init" in there to clear that up at startup.
>>
>> Then came the
>>
>> malformed UTF-8 character in JSON string, at character offset 296
>> (before "\x{e9}serv\x{e9} au ...")
>>
>> at decode_json that crashes the whole process. And for some reason,
>> it never gets over it, gets the same problem at every restart, like
>> the event is never "eaten" from the queue. Hacked an eval then a
>> try-catch over it...
>>
>> After coding a monitor to alert when a background_jobs has been "new"
>> over 5 minutes in the DB, I was inundated by messages. There's alway
>> one elasticsearch_update that escapes among the flurry, and they
>> slowly add up.
>>
>> At this point, the only viable solution is to run the workers but
>> disable RabbitMQ everywhere. Are we really the only ones
>> experiencing that?
>>
>> Regards,
>>
>> PS Our servers are well-above-average Debian 11 machines with lot of
>> firepower (ram, cpu, i/o...).
>>
>> --
>> Philippe Blouin,
>> Directeur de la technologie
>>
>> Tél. : (833) 465-4276, poste 230
>> philippe.blouin at inLibro.com
>>
>> inLibro | pour esprit libre | www.inLibro.com <http://www.inLibro.com>
>>
>> _______________________________________________
>> Koha-devel mailing list
>> Koha-devel at lists.koha-community.org
>> https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
>> website : https://www.koha-community.org/
>> git : https://git.koha-community.org/
>> bugs : https://bugs.koha-community.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20221221/77424044/attachment.htm>
More information about the Koha-devel
mailing list