[Koha-devel] The many failings of background_jobs_worker.pl

Wed Dec 21 21:48:47 CET 2022

Hello Fridolin,

Yes, performance is probably important for MQ, it seems to eat a 
disproportionate amount of resources sometime.  :)

But like I said, our server is a beast.  In every aspect of its build, 
it has the best components for 2022.

Let's see about the race condition:

  * An update is done on a biblio
  * An update_elastic_index job is created in the db.
  * Its ID is pushed onto MQ
  * background_jobs_worker.pl picks up the ID from MQ
      o it goes to the DB, and finds nothing with that ID.  We get a
        pointer error (yeah, I come from C)
      o This is NOT an old forgotten floating job, since we can see the
        job in the database when looking manually.
      o The jobs stays there forever, with status 'new'.
  * If I add a "sleep 1", this issue _mostly_ disappear.

There's no server performance that could explain this.  Maybe some DB 
caching ?

Philippe Blouin,
Directeur de la technologie

Tél.  : (833) 465-4276, poste 230
philippe.blouin at inLibro.com

inLibro | pour esprit libre | www.inLibro.com <http://www.inLibro.com>
On 2022-12-21 13:06, Fridolin SOMERS wrote:
> Hi,
>
> I think network performance is really important for RabbitMQ.
> We at Biblibre have a virtual machine in each server, to share between 
> each virtual machine (one Koha per machine) but keep a good network 
> performance.
> Looks to work well, but we are still in 21.11.
>
> Best regards,
>
> Le 20/12/2022 à 09:13, Philippe Blouin a écrit :
>> Howdy!
>>
>> Since moving a lot of our users to 22.05.06, we've installed the 
>> worker everywhere.  But the number of issues encountered is staggering.
>>
>> The first one was
>>
>> Can't call method "process" on an undefined value
>>
>> where the id received from MQ was not found in the DB, and the 
>> process is going straight to process_job and failing. Absolutely no 
>> idea how that occurs, seems completely counterintuitive (the ID comes 
>> from the DB after all), but here it is.  Hacked the code to add a 
>> "sleep 1" to fix most of that one.
>>
>> Then came the fact that stored events were not checked if the 
>> connection to MQ was successful at startup.  Bug 30654 refers it. 
>> Hacked a little "$init" in there to clear that up at startup.
>>
>> Then came the
>>
>> malformed UTF-8 character in JSON string, at character offset 296 
>> (before "\x{e9}serv\x{e9} au ...")
>>
>> at decode_json that crashes the whole process.  And for some reason, 
>> it never gets over it, gets the same problem at every restart, like 
>> the event is never "eaten" from the queue.  Hacked an eval then a 
>> try-catch over it...
>>
>> After coding a monitor to alert when a background_jobs has been "new" 
>> over 5 minutes in the DB, I was inundated by messages. There's alway 
>> one elasticsearch_update that escapes among the flurry, and they 
>> slowly add up.
>>
>> At this point, the only viable solution is to run the workers but 
>> disable RabbitMQ everywhere.  Are we really the only ones 
>> experiencing that?
>>
>> Regards,
>>
>> PS Our servers are well-above-average Debian 11 machines with lot of 
>> firepower (ram, cpu, i/o...).
>>
>> -- 
>> Philippe Blouin,
>> Directeur de la technologie
>>
>> Tél.  : (833) 465-4276, poste 230
>> philippe.blouin at inLibro.com
>>
>> inLibro | pour esprit libre | www.inLibro.com <http://www.inLibro.com>
>>
>> _______________________________________________
>> Koha-devel mailing list
>> Koha-devel at lists.koha-community.org
>> https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
>> website : https://www.koha-community.org/
>> git : https://git.koha-community.org/
>> bugs : https://bugs.koha-community.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20221221/77424044/attachment.htm>