[Koha-bugs] [Bug 1993] Task Scheduler Needs Re-write

Tue May 5 04:12:31 CEST 2020

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=1993

--- Comment #52 from David Cook <dcook at prosentient.com.au> ---
I've been thinking more about this recently.

How to be all things to all people?

For instance, the original Koha task scheduler for Bug 1993 was meant to
schedule a report to run once at an absolute time and email the result to
someone. 

Bug 25245 wants plugins to be able to run every night. That is, a periodic task
that runs at the same time every day/night (while not being too fussed on exact
start/end times).

Bug 10662 wants to schedule tasks to run every X seconds. That is, a periodic
task that runs X seconds after it last ran. 

Those are arguably 3 different types of schedules. 

The Celery project would handle that using ETA and Periodic Tasks:
https://docs.celeryproject.org/en/stable/userguide/calling.html#eta-and-countdown
https://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html

Looking at cron, we can see it just loops infinitely and sleeps for 1 minute at
a time then checks what jobs need to run:
https://github.com/cronie-crond/cronie/blob/master/src/cron.c#L358

In the past, I've used timers using Perl's POE or Golang's Timer to alert me
when a job needs to start. That's fine so long as your process runs forever.
You do need to persist the job data, so that if you restart you can re-create
your timers. 

In terms of the actual implementation... lately I'm thinking a daemon with a
HTTP API so you can create, retrieve, update, delete, start timer, stop timer.
When a timer does fire... the daemon should add the task to a queue. (This is
also what cron does
https://github.com/cronie-crond/cronie/blob/master/src/cron.c#L589). 

Now that queue could be a RabbitMQ message queue. It could be an in-memory
queue in the task scheduler daemon and it could run tasks itself by forking a
child process, using a thread (if not Perl), or using a connected worker. It
could be a database table. 

On that note, it becomes apparent to me that we don't need to depend on Bug
22417 to move forward here. The task scheduler can be a black box with an API
for the front end, and then pluggable backends for different queue systems. 

Although as I say that I haven't thought it through enough.

With a FIFO queue, we run into cron's limitation of not tracking execution of a
task. As is, Bug 10662 scenario is impossible, because we don't have the result
of the task to update the task with new data (ie new "from" date) and new time
to schedule.

With RabbitMQ, we can use message acknowledgements to determine when the task
is completed. Celery also has a result store. 

But Bug 10662 would still be unique in that it's not really a periodic task but
rather an absolute task that gets rescheduled. It looks like Celery does
actually have callbacks that can be run on the result at the end of a task.
That's interesting. That would be a good thing to develop. Arguably the Report
Email functionality could make good use of that. The task would be to generate
the report and store the result. The callback would then email it. In the case
of Bug 10662, the callback could perhaps update the original task, enqueue a
new one. I'd have to think more about that mechanism...

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the QA Contact for the bug.