[Koha-bugs] [Bug 15032] [Plack] Scripts that fork (like stage-marc-import.pl) don't work as expected

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Thu Oct 25 01:46:39 CEST 2018


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=15032

--- Comment #31 from David Cook <dcook at prosentient.com.au> ---
Actually, when working on #10662, I thought about creating a standalone "import
daemon", as importing records into Koha is the hardest part of OAI-PMH
harvesting. (I already have an extremely fast download worker, but the
bottleneck is importing into Koha *sadface*.)

For #15032, "Stage MARC records for import" could simply be a script that
allows a use to upload MARC records from their browser client to the Koha
server, and to pass on some configuration options to accompany those records.
We could store all of that on the file system, in the database, or wherever
makes the most sense. If we wanted to have fun, we could leave them in-memory
and let an import daemon suck them out of memory instead of a disk which would
require way more I/O ops.

Once the records are uploaded to the Koha server and stored <wherever> with the
job details, the PSGI/CGI script alerts the "import daemon" that it would like
to start the job. 

Now the "import daemon" should probably have a master listener process which
does very little work, which makes it highly responsive to queries from the
Koha web client. That master listener process can talk to worker processes that
do the actual work. In terms of imports, it probably would be good if the
master process can query the workers for progress updates (and probably also
allow cancellations of imports - which might be useful for very large imports).

(Oh another thing... this import daemon wouldn't just be usable from a web
interface. It could also be used by command line tools. Thus we have 1 point
for getting records into Koha, and that daemon can act as gatekeeper.)

We default the "import daemon" workers to maybe 1 worker, but allow it to be
configurable, so that large Koha instances can commit more resources to it. 

(If we were smart, we'd probably have the import daemon use TCP for
communications and allow workers to be distributed. Although to do that we'd
need to package Koha libraries for working with records and distribute those
too. Plus we'd need database configuration information distributed. I mean we
could have an API to handle all the actual imports, but that would be slow...)

Just some ideas ^_^.

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list