[Koha-devel] Task schedulers and message queues for Koha

David Cook dcook at prosentient.com.au
Thu May 4 01:42:16 CEST 2017


Hi Jonathan,

 

I have scrapped the work that I was doing on a generic task scheduler. Instead, I’ve developed a daemon which just works for OAI-PMH harvesting. Feel free to go ahead with whatever you’re planning, and I’m happy to contribute ideas. 

 

I was concerned that I was being overly ambitious, and that the project would never be accepted into Koha. A third-party message queue like RabbitMQ would add another dependency to Koha, which would further complicate installations and maintainance, although I think it might still be the best way forward. Or using something like ZeroMQ to set up our own using some established work. I had written my own message queue in Perl, which was fairly easy to do, so that’s always an alternative. For the task scheduler, I used POE for the event framework and used timers to schedule tasks. Initially I didn’t use a message queue, but I think using one would be more optimal. You fire up some workers, register them with the message queue, and then they consume messages as the message queue assigns them. The task scheduler would then just be used for initially queuing the messages into the queue for the workers to consume.

 

With the OAI-PMH daemon, which I’d like to post ASAP to 10662, I’m still using POE for the event framework, but I’m using POE::Component::JobQueue to handle the queue. I have a queue for downloading and a queue for importing. Each queue has X workers which run in parallel. At the moment, I’m forking the workers, since it was the easiest thing to do, but it is a little bit heavy. Not in terms of the overhead of forking, which is fairly non-existent really, but since you’re getting a copy of the harvester for each forked worker, the resources seem to add up a bit. At the moment, my design is for a single Koha system, or one with a lot of resources. Anyway, so the Koha web interface connects to the OAI-PMH harvester daemon using a UNIX socket. In koha-conf.xml, I have a line pointing to a configuration file, and in there is a socket address. It uses a super simple protocol serialised in JSON with null terminated lines to submit/list/start/stop/delete jobs in the harvester. The harvester downloads records to the file system and adds a pointer to the database, and then the importer job queue assigns a database entry to each of its workers and imports the records into Koha. 

 

I was thinking that even with a task scheduler and message queue, I’d probably still implement the OAI-PMH harvester as I have. Maybe I could replace the UNIX socket connection with the message queue, so the harvester consumes messages from the queue rather than the client, but it’s a bit academic. The harvester needs to have direct control over its workers rather than the queue sending messages to the workers, so that it can control the jobs directly. I’m not a huge fan of how the Python-based Celery scheduler manages cancelled jobs, although I found Celery to be a neat piece of work. 

 

Anyway, long story short, no real news. I’ve abandoned making a generic task scheduler and message queue, and just made my to-purpose daemon which implements its own internal queue management for the sake of simplicity and efficacy. 

 

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St

Ultimo, NSW 2007

Australia

 

Office: 02 9212 0899

Direct: 02 8005 0595

 

From: koha-devel-bounces at lists.koha-community.org [mailto:koha-devel-bounces at lists.koha-community.org] On Behalf Of Jonathan Druart
Sent: Wednesday, 26 April 2017 2:21 AM
To: koha-devel at lists.koha-community.org
Subject: Re: [Koha-devel] Task schedulers and message queues for Koha

 

On Thu, 23 Feb 2017 at 00:51 David Cook <dcook at prosentient.com.au <mailto:dcook at prosentient.com.au> > wrote:

I’m planning to post the code for what I have already in early March.

 

Any news here? 

We really need to remove the way our background jobs are implemented to make them work under Plack.

I'd like to avoid duplication of work...

 

 David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St

Ultimo, NSW 2007

Australia

 

Office: 02 9212 0899 <tel:02%2092%2012%2008%2099> 

Direct: 02 8005 0595 <tel:02%2080%2005%2005%2095> 

 

From: Tomas Cohen Arazi [mailto:tomascohen at gmail.com <mailto:tomascohen at gmail.com> ] 
Sent: Thursday, 23 February 2017 2:16 PM
To: David Cook <dcook at prosentient.com.au <mailto:dcook at prosentient.com.au> >; Tajoli Zeno <z.tajoli at cineca.it <mailto:z.tajoli at cineca.it> >; koha-devel at lists.koha-community.org <mailto:koha-devel at lists.koha-community.org> 


Subject: Re: [Koha-devel] Task schedulers and message queues for Koha

 

Share it :-)

 

El mié., 22 de feb. de 2017 9:57 PM, David Cook <dcook at prosentient.com.au <mailto:dcook at prosentient.com.au> > escribió:

Hi Zeno,

I have a number of concerns about Celery. One of those is that it would add
numerous external dependencies and complexity to Koha implementations.

Your suggestion of Celery + RabbitMQ + AnyEvent::RabbitMQ sounds ok,
although it would involve work too. While Celery clients exist for PHP and
Node.js, we'd need to create a Perl implementation of the Celery protocol
using AnyEvent::RabbitMQ (or Net::RabbitFoot). Not that I'm necessarily
opposed to that.

We'd also still need to write the tasks in Python (or use web hooks which
would have the overhead of HTTP plus you'd have to worry about your web
server being up). I'm not sure how keen the community at large is to support
more server-side languages. I like writing Python, so I don't mind porting
over my OAI-PMH code from Perl to Python. I've abandoned the HTTP::OAI
module anyway for a few reasons.

RabbitMQ is a pretty heavy duty product as well which comes with its own
requirements: https://www.rabbitmq.com/production-checklist.html. While we
currently help people with Apache, MySQL, Zebra, and ElasticSearch, we'd
also all need to become experts with RabbitMQ.

I've already put together a Perl-based scheduler using POE which forks its
own workers. And I've already put together a basic Perl-based message queue
which sends events to pre-existing workers (like Celery). Celery with
RabbitMQ is more mature and complex, but my Perl programs do the trick.

Looking at DSpace's OAI-PMH harvester, it works very much like my first
design. It's a Java scheduler which uses threads rather than child processes
to do its work.

Due to the lack of engagement overall, I think I'll probably just keep my
existing design, since it works and works quite well.

David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St
Ultimo, NSW 2007
Australia

Office: 02 9212 0899 <tel:02%2092%2012%2008%2099> 
Direct: 02 8005 0595 <tel:02%2080%2005%2005%2095> 


> -----Original Message-----
> From: Tajoli Zeno [mailto:z.tajoli at cineca.it <mailto:z.tajoli at cineca.it> ]
> Sent: Wednesday, 22 February 2017 7:49 PM
> To: David Cook <dcook at prosentient.com.au <mailto:dcook at prosentient.com.au> >; koha-devel at lists.koha- <mailto:koha-devel at lists.koha-> 
> community.org <http://community.org> 
> Subject: Re: [Koha-devel] Task schedulers and message queues for Koha
>
> Hi David and all,
>
> Il 21/02/2017 23:29, David Cook ha scritto:
> >. Two, they wanted to
> > execute OAI-PMH requests every 2-3 seconds and cron has 1 minute as
> >its  finest granularity. Three, even if you setup a cronjob to run
> >every minute,  long running tasks could get duplicated (although you
> >could mitigate that  with locks which would be a pain). Plus, you want
> >to run tasks in parallel,  so you're going to want to use multiple
> >processes, which cron isn't really  set up to achieve.
>
> Ok, if you need those features cron isn't enough.
> But why do you drop the option Celery +  RabbitMQ + AnyEvent::RabbitMQ
>
> They have official debiano packages:
> https://packages.debian.org/jessie/python-celery
> https://packages.debian.org/jessie/rabbitmq-server
> https://packages.debian.org/jessie/libanyevent-rabbitmq-perl
>
> We still use one of their dpendencies for similar tasks (libanyevent-perl
"event
> loop framework with multiple implementations").
>
> Python is already present in our Debian/Ubuntu system, it is a prereq of
the
> distributions.
>
> Redone a so complex stack in perl i think is very complex.
>
> Bye
> Zeno Tajoli
>
>
>
> --
> Zeno Tajoli
> /SVILUPPO PRODOTTI CINECA/ - Automazione Biblioteche
> Email: z.tajoli at cineca.it <mailto:z.tajoli at cineca.it>  Fax: 051/6132198
> *CINECA* Consorzio Interuniversitario - Sede operativa di Segrate (MI)


_______________________________________________
Koha-devel mailing list
Koha-devel at lists.koha-community.org <mailto:Koha-devel at lists.koha-community.org> 
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

-- 

Tomás Cohen Arazi

Theke Solutions (https://theke.io <http://theke.io/> )
✆ +54 9351 3513384 <tel:+54%209%20351%20351-3384> 
GPG: B2F3C15F

_______________________________________________
Koha-devel mailing list
Koha-devel at lists.koha-community.org <mailto:Koha-devel at lists.koha-community.org> 
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20170504/4dc133aa/attachment-0001.html>


More information about the Koha-devel mailing list