[Koha-bugs] [Bug 10662] Build OAI-PMH Harvesting Client

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Tue Nov 17 05:36:18 CET 2015


http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662

--- Comment #30 from David Cook <dcook at prosentient.com.au> ---
Thinking more about matching and how complex or even impossible it is.

Consider that you have 2 different OAI-PMH servers with 2 matching records and
also 1 matching record locally on Koha.

Which is the source of truth? 

You might argue that the harvested records have a higher priority than the
local record... so you can overwrite the local record.

However, what about the 2 harvested records? Which one takes precedence? 

What about if they have holdings records? The National Library of Sweden
requires that holdings records be partially merged into bibliographic
records... that becomes difficult in this scenario. Every time there is a
holding record update, you would need to re-create the bibliographic record
from the last harvested bibliographic record (otherwise the
holding-bibliographic merge would quite quickly get duplicated or otherwise
incorrect fields).

I suppose you could choose the most recent bibliographic record as the highest
priority, and you could blindly merge holdings into that bibliographic record
on each update...

You'd have to set up a relationship somewhere between the holdings and the
bibliographic record though and this gets tough because the holdings from one
OAI-PMH server aren't going to map to that bibliographic record using the
004/001 mechanism.

That is... Holdings A 004 refers to  Bibliographic A 001, so there is a link
there. However, Holdings B 004 refers to Bibliographic B 001 which we're
discarding as it's a "duplicate". 

So we need to have a linkage somewhere between Holdings B and Bibliographic A
001 or preferably Bibliographic A 999$c.

I think that might be possible, but certainly not with Koha's existing import
mechanisms.

--

Importing holdings are going to have other issues as well like how to enforce
barcode uniqueness... and how to manage values in records that don't exist in
Koha.

I also need to use my special OAI import system for managing holdings imports,
because there will be no reference to the OAI-PMH unique identifier in the Koha
item MARXML, so there's no way to use the existing import system to check if
that item already exists.

I also don't think there's any way to replace an item record using this system
unless it shares the same 952$9. I might be wrong, I haven't investigated that
issue thoroughly, but I bet I'm right, as it's a tough one. 

It's also worth reviewing the section marked "Embedded Holdings Information" in
http://www.loc.gov/marc/holdings/hd852.html or
http://www.loc.gov/marc/bibliographic/bd852.html. 

Part of the difficulty with the holdings is the fact that Koha doesn't support
MARC21 Format for Holdings Data (MFHD). It would be much easier if it did,
although there would still be the problems with the source of truth when
merging bibliographic records. 

Merging records and de-duplicating is one thing when your system is relatively
static or updated semi-manually, but when you're importing and auto-merging
records at a speed of X records every 2 seconds, you're probably going to run
into problems.

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list