[Koha-devel] Background jobs quirks

Jonathan Druart jonathan.druart at bugs.koha-community.org
Thu Oct 19 08:41:20 CEST 2023


Hum weird, isn't DBIC supposed to deal with reconnection?
You could add the find in a try, then find again in the catch... (for debug
purpose!)
Or force a new DBMS connection for each job (C4::Context->dbh({new => 1})),
but that's not a long term solution.

Le jeu. 19 oct. 2023 à 08:04, David Cook <dcook at prosentient.com.au> a
écrit :

> Just one last one….
>
>
>
> I put a Carp on to the SIGPIPE and sure enough it is the
> Koha::BackgroundJobs->find() causing the SIGPIPE to be raised.
>
>
>
> Of course, ignoring the SIGPIPE isn’t going to help because the job won’t
> be fetched.
>
>
>
> I’m not sure what the answer here is… and unfortunately I’ve ran out of
> time today. With my work task, I’ll probably abandon using the background
> jobs for the moment as I need to do something more reliable. I’ll think
> about this one later…
>
>
>
> at /kohadevbox/koha/misc/background_jobs_worker.pl line 62.
>
>     main::__ANON__("PIPE") called at
> /usr/share/perl5/DBIx/Class/Storage/DBI.pm line 932
>
>     eval {...} called at /usr/share/perl5/DBIx/Class/Storage/DBI.pm line
> 932
>
>
> DBIx::Class::Storage::DBI::connected(DBIx::Class::Storage::DBI::mysql=HASH(0x55b81d683d78))
> called at /usr/share/perl5/DBIx/Class/Storage/DBI.pm line 850
>
>
> DBIx::Class::Storage::DBI::__ANON__(DBIx::Class::Storage::BlockRunner=HASH(0x55b81f9d8d28))
> called at /usr/share/perl5/DBIx/Class/Storage/BlockRunner.pm line 190
>
>     DBIx::Class::Storage::BlockRunner::__ANON__() called at
> /usr/share/perl5/Context/Preserve.pm line 38
>
>     Context::Preserve::preserve_context(CODE(0x55b81fa01620), "replace",
> CODE(0x55b81fa016c8)) called at
> /usr/share/perl5/DBIx/Class/Storage/BlockRunner.pm line 213
>
>
> DBIx::Class::Storage::BlockRunner::_run(DBIx::Class::Storage::BlockRunner=HASH(0x55b81f9d8d28),
> CODE(0x55b81f9d8ce0)) called at
> /usr/share/perl5/DBIx/Class/Storage/BlockRunner.pm line 105
>
>
> DBIx::Class::Storage::BlockRunner::run(DBIx::Class::Storage::BlockRunner=HASH(0x55b81f9d8d28),
> CODE(0x55b81f9d8ce0)) called at /usr/share/perl5/DBIx/Class/Storage/DBI.pm
> line 856
>
>     DBIx::Class::Storage::DBI::dbh_do(undef, undef, "SELECT `me`.`id`,
> `me`.`status`, `me`.`progress`, `me`.`size`"..., ARRAY(0x55b81f453d30),
> ARRAY(0x55b81f81d380)) called at /usr/share/perl5/DBIx/Class/Storage/DBI.pm
> line 1939
>
>
> DBIx::Class::Storage::DBI::_execute(DBIx::Class::Storage::DBI::mysql=HASH(0x55b81d683d78),
> "select", ARRAY(0x55b81f9d3278), ARRAY(0x55b81d7db5a8),
> HASH(0x55b81f16bd08), HASH(0x55b81f6cc0b8)) called at
> /usr/share/perl5/DBIx/Class/Stor
>
> age/DBI.pm line 2584
>
>
> DBIx::Class::Storage::DBI::_select(DBIx::Class::Storage::DBI::mysql=HASH(0x55b81d683d78),
> ARRAY(0x55b81f9d3278), ARRAY(0x55b81d7db5a8), HASH(0x55b81f16bd08),
> HASH(0x55b81f4ce6c0)) called at /usr/share/perl5/DBIx/Class/Storage/DBI.pm
>
> line 2761
>
>
> DBIx::Class::Storage::DBI::select_single(DBIx::Class::Storage::DBI::mysql=HASH(0x55b81d683d78),
> ARRAY(0x55b81f9d3278), ARRAY(0x55b81d7db5a8), HASH(0x55b81f16bd08),
> HASH(0x55b81f4ce6c0)) called at /usr/share/perl5/DBIx/Class/ResultSet
>
> .pm line 1101
>
>
> DBIx::Class::ResultSet::single(DBIx::Class::ResultSet=HASH(0x55b81d89b180))
> called at /usr/share/perl5/DBIx/Class/ResultSet.pm line 910
>
>
> DBIx::Class::ResultSet::find(DBIx::Class::ResultSet=HASH(0x55b81f9d6b68),
> 12) called at /kohadevbox/koha/Koha/Objects.pm line 96
>
>     Koha::Objects::find("Koha::BackgroundJobs", 12) called at
> /kohadevbox/koha/misc/background_jobs_worker.pl line 132
>
>
>
> David Cook
>
> Senior Software Engineer
>
> Prosentient Systems
>
> Suite 7.03
>
> 6a Glen St
>
> Milsons Point NSW 2061
>
> Australia
>
>
>
> Office: 02 9212 0899
>
> Online: 02 8005 0595
>
>
>
> *From:* David Cook <dcook at prosentient.com.au>
> *Sent:* Thursday, October 19, 2023 4:26 PM
> *To:* 'Koha Devel' <koha-devel at lists.koha-community.org>
> *Cc:* 'Jonathan Druart' <jonathan.druart at bugs.koha-community.org>; 'Tomas
> Cohen Arazi' <tomascohen at gmail.com>
> *Subject:* RE: [Koha-devel] Background jobs quirks
>
>
>
> Success!
>
>
>
> I created the persistent database connection between the background job
> worker and the database, and then I restarted the database.
>
>
>
> When I tried to enqueue a background job, it’s now stuck in a “New” status
> with a Progress of “null/1”, and /var/log/koha/kohadev/worker-error.log
> says the following:
>
> 20231019 05:02:14 kohadev-koha-worker-long_tasks: client (pid 23247)
> killed by signal 13, respawning
>
>
>
> Now Starman seems to have managed to re-establish a connection without any
> errors. It’s not clear why that might be. I suppose Starman/Plack might be
> suppressing SIGPIPE errors. We might have some subtle issues under the hood
> without even realizing it.
>
>
>
> (That suddenly reminds me of that MYSQL_OPT_RECONNECT issue from August.
> MySQL 8.0.34/8.1.0 deprecate automatic reconnection. I would hope that
> DBD::MySQL would take care of that, but as we might recall that module
> isn’t really maintained…)
>
>
>
> I am testing on 22.11 so maybe it’s a bit different in master… but it’s
> interesting.
>
>
>
> David Cook
>
> Senior Software Engineer
>
> Prosentient Systems
>
> Suite 7.03
>
> 6a Glen St
>
> Milsons Point NSW 2061
>
> Australia
>
>
>
> Office: 02 9212 0899
>
> Online: 02 8005 0595
>
>
>
> *From:* David Cook <dcook at prosentient.com.au>
> *Sent:* Thursday, October 19, 2023 3:55 PM
> *To:* 'David Cook' <dcook at prosentient.com.au>; 'Koha Devel' <
> koha-devel at lists.koha-community.org>
> *Subject:* RE: [Koha-devel] Background jobs quirks
>
>
>
> On a standard Debian package install of Koha, it looks like there are 3
> persistent connections to the database: 2x Starman workers and 1x Zebra
> Indexer.
>
>
>
> Then if you use the background jobs, the background job worker for that
> queue will create a persistent connection as well.
>
>
>
> I had to work out the MySQL/MariaDB ID through trial and error since there
> wasn’t enough information in the processlist in the database, but I got the
> ID, so I’ll check the logs in the morning to see if that’s the problem…
>
>
>
> --
>
>
>
> I’m losing some confidence in this theory as I’m looking at the
> koha-testing-docker database instance and I’m seeing similar timeouts, but
> it hasn’t triggered the “killed by signal 13, respawning”.
>
>
>
> 2023-10-19  3:06:09 44 [Warning] Aborted connection 44 to db:
> 'koha_kohadev' user: 'koha_kohadev' host: '172.21.0.4' (Got an error
> reading communication packets)
>
> 2023-10-19  3:06:09 32 [Warning] Aborted connection 32 to db:
> 'koha_kohadev' user: 'koha_kohadev' host: '172.21.0.4' (Got an error
> reading communication packets)
>
>
>
> --
>
>
>
> I’m thinking the best thing to do is just ignore SIGPIPE and let our
> regular error handling take care of it, as we should then get a more
> nuanced error message somewhere…
>
>
>
> Looking at https://metacpan.org/dist/Net-Stomp/source/lib/Net/Stomp.pm
> and it ignores SIGPIPE when it’s sending frames… so it’s probably not the
> $conn->ack() doing it.
>
>
>
> I do wonder if it’s “my $job =
> Koha::BackgroundJobs->find($args->{job_id});”
>
>
>
> I suppose we’ll see in the morning…
>
>
>
> David Cook
>
> Senior Software Engineer
>
> Prosentient Systems
>
> Suite 7.03
>
> 6a Glen St
>
> Milsons Point NSW 2061
>
> Australia
>
>
>
> Office: 02 9212 0899
>
> Online: 02 8005 0595
>
>
>
> *From:* Koha-devel <koha-devel-bounces at lists.koha-community.org> *On
> Behalf Of *David Cook via Koha-devel
> *Sent:* Thursday, October 19, 2023 2:41 PM
> *To:* 'Koha Devel' <koha-devel at lists.koha-community.org>
> *Subject:* [Koha-devel] Background jobs quirks
>
>
>
> Hi all,
>
>
>
> Have you seen something like the following in your worker-error.log?
>
>
>
> 20231018 07:44:06 instance-koha-worker-long_tasks: client (pid 3949888)
> killed by signal 13, respawning
>
> 20231019 07:21:26 instance-koha-worker-long_tasks: client (pid 4082852)
> killed by signal 13, respawning
>
>
>
> I found some discussion in a thread
> https://www.mail-archive.com/koha@lists.katipo.co.nz/msg30046.html
>
>
>
> Signal 13 would be a SIGPIPE I believe. At first, I assumed it was the
> connection between the background worker and RabbitMQ, but based off the
> RabbitMQ logs[1] that seemed unlikely.
>
>
>
> MariaDB is another possibility. Looking at its logs [2] and I see some
> timed out connections.
>
>
>
> I might monitor the connections between the background workers and the
> database tonight and see if it’s related. It might also explain why it
> happens on my MariaDB system but not my MySQL system…
>
>
>
> Anyway, just putting out some feelers…
>
>
>
> [1]
>
> 2023-10-18 07:44:06.492400+11:00 [info] <0.30487.131> accepting STOMP
> connection <0.30487.131> (127.0.0.1:59120 -> 127.0.0.1:61613)
>
> 2023-10-18 07:44:06.500534+11:00 [info] <0.30487.131> closing STOMP
> connection <0.30487.131> (127.0.0.1:59120 -> 127.0.0.1:61613)
>
> 2023-10-18 07:44:06.510426+11:00 [info] <0.20712.130> closing STOMP
> connection <0.20712.130> (127.0.0.1:57508 -> 127.0.0.1:61613)
>
> 2023-10-18 07:44:08.703380+11:00 [info] <0.30518.131> accepting STOMP
> connection <0.30518.131> (127.0.0.1:59882 -> 127.0.0.1:61613)
>
>
>
> 2023-10-19 07:21:26.254065+11:00 [info] <0.27344.133> accepting STOMP
> connection <0.27344.133> (127.0.0.1:54116 -> 127.0.0.1:61613)
>
> 2023-10-19 07:21:26.259129+11:00 [info] <0.27344.133> closing STOMP
> connection <0.27344.133> (127.0.0.1:54116 -> 127.0.0.1:61613)
>
> 2023-10-19 07:21:26.269652+11:00 [info] <0.30518.131> closing STOMP
> connection <0.30518.131> (127.0.0.1:59882 -> 127.0.0.1:61613)
>
> 2023-10-19 07:21:26.660604+11:00 [info] <0.27372.133> accepting STOMP
> connection <0.27372.133> (127.0.0.1:46178 -> 127.0.0.1:61613)
>
>
>
> [2]
>
> Oct 18 02:08:28 awesome-host mariadbd[959]: 2023-10-18  2:08:28 6308
> [Warning] Aborted connection 6308 to db: 'koha_instance' user:
> 'koha_instance' host: 'localhost' (Got timeout reading communication
> packets)
>
> Oct 18 02:15:58 awesome-host mariadbd[959]: 2023-10-18  2:15:58 6304
> [Warning] Aborted connection 6304 to db: 'koha_instance' user:
> 'koha_instance' host: 'localhost' (Got timeout reading communication
> packets)
>
> Oct 18 02:16:06 awesome-host mariadbd[959]: 2023-10-18  2:16:06 6303
> [Warning] Aborted connection 6303 to db: 'koha_instance' user:
> 'koha_instance' host: 'localhost' (Got timeout reading communication
> packets)
>
> Oct 18 21:10:08 awesome-host mariadbd[959]: 2023-10-18 21:10:08 6380
> [Warning] Aborted connection 6380 to db: 'koha_instance' user:
> 'koha_instance' host: 'localhost' (Got timeout reading communication
> packets)
>
> Oct 19 02:21:17 awesome-host mariadbd[959]: 2023-10-19  2:21:17 6431
> [Warning] Aborted connection 6431 to db: 'koha_instance' user:
> 'koha_instance' host: 'localhost' (Got timeout reading communication
> packets)
>
> Oct 19 02:24:02 awesome-host mariadbd[959]: 2023-10-19  2:24:02 6432
> [Warning] Aborted connection 6432 to db: 'koha_instance' user:
> 'koha_instance' host: 'localhost' (Got timeout reading communication
> packets)
>
>
>
> David Cook
>
> Senior Software Engineer
>
> Prosentient Systems
>
> Suite 7.03
>
> 6a Glen St
>
> Milsons Point NSW 2061
>
> Australia
>
>
>
> Office: 02 9212 0899
>
> Online: 02 8005 0595
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20231019/7f3fe863/attachment-0001.htm>


More information about the Koha-devel mailing list