[Koha-bugs] [Bug 10662] Build OAI-PMH Harvesting Client

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Tue Dec 8 12:16:37 CET 2015


http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662

--- Comment #60 from Andreas Hedström Mace <andreas.hedstrom.mace at sub.su.se> ---
(In reply to David Cook from comment #52)
> (In reply to Leif Andersson from comment #51)
> > (In reply to David Cook from comment #50)
> > > Honestly though, I would ask that people think further about the frequency
> > > of harvests. Is every 2-3 seconds really necessary? Do we really need it to
> > > be able to perform that quickly?
> > > 
> > 
> > Well, the use case envisioned by Stockholm UL would in practice ideally
> > involve fetching ONE record every 10 minutes or so!
> > The cataloger will be creating/modifying a bib record and a mfhd record in
> > our union catalog.
> > Next, the cataloger will turn to our local catalog, Koha, expecting to find
> > this record already imported.
> > If there is a way for the harvester to decide which ONE record to
> > get...maybe even with some intervention by the cataloger...?
> > So when this ONE record is asked for, we want it to be a quick process
> > getting it from the source and into Koha.
> > 

To confuse things a little, I will have to contradict my colleague at Stockholm
Univ. Library a little buy saying that I don't see why we would want to
replicate functionality already offered by LIBRIS (the Swedish union catalogue,
for those lurking this thread) - where you can download records individually
and then run batch exports at night - rather than creating something
better/faster.

For me it is preferable to have the catalogue as up-to-date as possible, since
LIBRIS will be the "master" (or source of truth as David calls it) for our
data. I would rather want the harvester to run every ten seconds or so (or
however fast we can get it), to get all updates made to "our" records. The only
drawback I can see from such an approach would be an increased load on the
servers, which is not a trivial thing of course, but something that should be
manageable. (LIBRIS might have a bigger problem if a lot of Swedish libraries
start using OAI-PMH harvesting with this approach, but they have themselves
recommended this use and will have to handle it accordingly.)

Also, I would prefer if the process of harvesting records can be as automated
as possible, not involving any extra steps on the catalogers part. We want to
make their cataloging easier - not more complex!

> So you're saying that the union catalogue will only have updates about once
> every 10 minutes? Or that the cataloguer will only be accessing a record in
> the union catalogue and Koha once every 10 minutes?

Records that we handle, i.e. adding/updating either the bibliographic record or
the holdings record (or both), is probably only done around every 5-10 min as
Leif say. But changes to the bibliographic records, for which we have holdings
attached, made by other Swedish libraries is probably much more often. I will
try to look at this in the upcoming days, manually harvesting at close
intervals (I'm thinking about both trying a 3 second and 10 second approach) to
see how many records are downloaded with each harvest.

> I am including a mechanism for fetching "one" record, so long as the user
> knows the OAI-PMH identifier they're after. They'll be able to add a task
> for that. Perhaps a future development could be done to provide an interface
> in the cataloguing module for adding/updating a single record. In place of
> that interface, they'll be able to add a task in the same area as the other
> tasks in order to get the one record...
> 
> > Then nightly more massive harvests could be performed to catch up with other
> > modifications to the union catalog.
> > 
> 
> Those nightly harvests would certainly be possible with the current design. 

As I mentioned above, ideally I would want the harvester to run repeatedly, on
short intervals. But other libraries whom are interested in using the harvester
might have other ideas on which set-up would be best for them. So having the
flexibility of running either way (individual harvest+night more massive
harvestes or repeated harvest every 10 seconds or so) would be wonderful!

I think Davids idea for a plug-in approach, together with the harvest tasks,
will work well here!

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list