[Koha-devel] [Koha] OAI-PMH harvester

BOUIS Sonia sonia.bouis at univ-lyon3.fr
Tue Nov 22 15:12:44 CET 2022


Hi,
Thanks to David, Tomas, Michal and Michael for your replies.

So we have decided to evaluate several external OAI-PMH client that could be used by Koha and to choose one in the end of January
There a lot to do after that and we discussed about the background jobs and cronjobs seems to be appropriate. We thought that the settings in the koha intranet should be only to define URLs, SETs, or XSLT sheets (for example, to transform DC XML in MARCXML).

We are only at the begining of the process 😊

Kind regards,
Sonia

------------------------------

Message: 2
Date: Wed, 26 Oct 2022 10:37:49 +1100
From: "David Cook" <dcook at prosentient.com.au>
To: "'Tomas Cohen Arazi'" <tomascohen at gmail.com>, "'BOUIS Sonia'"
	<sonia.bouis at univ-lyon3.fr>
Cc: "'koha'" <koha at lists.katipo.co.nz>, "'koha-devel'"
	<koha-devel at lists.koha-community.org>
Subject: Re: [Koha-devel] [Koha] OAI-PMH harvester
Message-ID: <07af01d8e8ca$dfbddef0$9f399cd0$@prosentient.com.au>
Content-Type: text/plain; charset="utf-8"

Hi Sonia,

 	

I’m excited to hear that KohaLA would like to finance an OAI-PMH client in Koha! This functionality is always brewing in the back of my mind, since I first raised 10662 back in 2013.

 

As Tomas says, I think that the background jobs are a key component for processing incoming OAI-PMH records. 

 

However, the ***missing component right now is the scheduling of the OAI-PMH harvesting tasks***, and I think this is where opinions get divided. Below, I’ll provide some history and opinions on Koha OAI-PMH.

 

--

 

With 10662, the sponsored goal was for Koha library staff to schedule OAI-PMH harvests through the Web UI. However, Fridolin from BibLibre raised a point with me at Kohacon18 about how letting library staff control the timing of harvesting tasks could be a problem for support vendors. If too many libraries using the same public IP address tried to harvest from the same OAI-PMH repository, they could be rate limited or blocked. There could also be server load concerns. So there probably needs to be a balance between user configuration and system configuration. If I recall correctly, this is how DSpace’s OAI-PMH harvester works. Users set up targets and can start/stop harvests, but things like frequency and concurrency are handled by the system configuration.

 

Based on my experience working on OAI-PMH on and off for nearly 10 years and as a Koha support vendor, I think my preference would be for sysadmins to handle most of the OAI-PMH harvesting details. 

 

The sponsorship for 10662 had certain requirements that many other libraries might not have, which is what made me think that it might be better to have an external client that connects to Koha. I thought maybe I could get the ordinary requirements pushed into Koha, and then handle extraordinary requirements externally. However, an external harvester won’t perform as fast as an internal harvester. (The compromise would be to write the harvester in such a way that people could provide different OAI-PMH harvester Perl modules that all stage records using the same core Koha modules.)

 

Even then… the scheduling would depend on a library’s needs. Back in 2013, I had a Koha OAI-PMH harvester which worked as a cronjob. It would run each night. However, some libraries want to run OAI-PMH harvests as frequently as every 3 seconds. A cronjob’s smallest frequency is 60 seconds, so that wouldn’t work for that requirement. 

 

If a cronjob isn’t suitable, then I think you’d need a daemon created by a new command like “koha-oai --start <instance_name>”. It could read a configuration file and handle scheduling accordingly. With 10662, I used the POE module, because I knew it well and it has some timer tools for scheduling tasks. If I were to work on it again, I’d probably use Mojo::IOLoop instead these days, since Mojolicious is already part of Koha while POE is not. (That said, using modules like Mojo and POE are difficult, because they’re difficult to test using automation. That was one of the stumbling blocks with 10662. While the 10662 harvester worked very well, it was difficult to unit test. In hindsight, I should’ve written it in a way that was easier to unit test, but it had a lot of event-driven code which made things more difficult.)

 

Another option would be to create a generic daemon for task scheduling in general (e.g. “koha-schedule”). Koha could use this for many things, but it’s a project in itself. 

 

--

 

The process of downloading OAI-PMH records and importing MARCXML into Koha is actually a fairly straightforward process. The difficulty is the task scheduling and management of tasks (and unit testing). 

 

I don’t know the answer that will make everyone happy. There’s lots of different ways of managing and scheduling the tasks. Based on my experience, I’d suggest targeting the simplest approach first, because complexity will make it less likely for the project to succeed. 

 

On that note, I’d be happy to test/QA any OAI-PMH harvester put forward. When I was writing OAI-PMH harvester patches, I found it really hard to get QA, so I’m happy to be that resource for someone else. I’ve spent a lot of time thinking about this topic, so happy to provide advice, warnings, emotional support 😉. 

 

David Cook

Senior Software Engineer

Prosentient Systems

Suite 7.03

6a Glen St

Milsons Point NSW 2061

Australia

 

Office: 02 9212 0899

Online: 02 8005 0595

 

From: Koha-devel <koha-devel-bounces at lists.koha-community.org> On Behalf Of Tomas Cohen Arazi
Sent: Wednesday, 26 October 2022 3:46 AM
To: BOUIS Sonia <sonia.bouis at univ-lyon3.fr>
Cc: koha <koha at lists.katipo.co.nz>; koha-devel <koha-devel at lists.koha-community.org>
Subject: Re: [Koha-devel] [Koha] OAI-PMH harvester

 

I think with background jobs we have most of the framework that is needed to deal with this within Koha.

 

Best regards

 

El mar, 25 oct 2022 7:08, BOUIS Sonia <sonia.bouis at univ-lyon3.fr <mailto:sonia.bouis at univ-lyon3.fr> > escribiĂł:

Hi,
KohaLA would like to finance an OAI-PMH client in Koha but, we have  questions that we want to raise to the community.
There was already tries to propose an OAI-PMH client :
- https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662 : it's an old project that doesnt seem compatible with the current version of Koha
- https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25905 : the scope is more to use an external OAI-PMH client and to connect it to Koha

Our main question is about the way to handle this. Do you think that it's a better idea to use an external software or PERL routine and to find a way to connect it to Koha. Or would it be better to a new module in Koha from scratch and that Koha have his own OAI-PMH client.

Please, let us hear your toughts about this projet.

Kind regards

Sonia

Sonia BOUIS
------------------------------------------------------
Responsable du Service informatique documentaire Département d'Appui à la Recherche et aux Projets (DARP) Bibliothèques universitaires Université Jean Moulin Lyon 3 ADRESSE GÉOGRAPHIQUE > Manufacture des Tabacs | 6 cours Albert Thomas | LYON 8e ADRESSE POSTALE > Bibliothèque de la Manufacture | 1C avenue des Frères Lumière | CS 78242 - 69372 LYON CEDEX 08

Ligne directe : 33 (0)4 78 78 79 03

http://bu.univ-lyon3.fr<http://bu.univ-lyon3.fr/>| Suivez-nous > Facebook<https://www.facebook.com/bulyon3/> | Twitter<https://twitter.com/bulyon3>| Instagram<https://www.instagram.com/bu.lyon3/?hl=fr>

_______________________________________________

Koha mailing list  http://koha-community.org Koha at lists.katipo.co.nz <mailto:Koha at lists.katipo.co.nz>
Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20221026/d7712779/attachment-0001.htm>

------------------------------

Subject: Digest Footer

_______________________________________________
Koha-devel mailing list
Koha-devel at lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/ git : https://git.koha-community.org/ bugs : https://bugs.koha-community.org/


------------------------------

End of Koha-devel Digest, Vol 203, Issue 15
*******************************************


More information about the Koha-devel mailing list