[Koha-bugs] [Bug 10662] Build OAI-PMH Harvesting Client

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Tue Nov 17 06:20:03 CET 2015


http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662

--- Comment #31 from David Cook <dcook at prosentient.com.au> ---
_HOLDINGS_

Diagramming this now...

OAI ID -> Koha ID -> Original 001 -> Parent 001
oai::1 -> bib1 -> 1a -> null
oai::2 -> bib1 -> 1b -> null 
oai::3 -> item 3 -> 3a -> 1a
oai::4 -> item 4 -> 4a -> 1b

So here we've downloaded oai::1 and added it as bib1.

We've downloaded oai::2 and determined that it is a duplicate of bib1. We can
either overwrite bib1 or we can simply link to it.

We've downloaded oai::3. It's original parent 001 is 1a, so we can link oai::3
to bib1.

We've downloaded oai::4. It's original parent is 1b, so we can link oai::4 to
the entry for oai::2, which is bib1.

In this case, the only problems we have are determining what makes a match, and
determining whether oai::1 or oai::2 should provide the metadata for bib1. We 
probably need another field to say which OAI record is the source of truth for
bib1.

I might be able to use the existing C4::Matcher() for this... 

It's worth noting that the downloaded metadata will need to be used every time
there's an item update, because the National Library of Sweden requires that
item-level data be merged into the host bibliographic record... and the only
way to do that cleanly is to start with a virgin bibliographic record each
time. (Otherwise, when you change a 863 in a holdings record, you won't be
updating a 863 in the bib record, you'll be adding a new one and the old one
will stay there incorrectly.) Every holdings record will also need to be merged
in as well, which could probably cause load problems with records with lots of
holdings... 

Actually, I'm not sure how that's even possible now that I think about it...
since you're harvesting from the holdings endpoint every 2 seconds... 

I suppose you could queue updates to a bibliographic record from the holdings
records... but that wouldn't make sense as every updated holdings record would
require a full update of the bibliographic record from all holdings records...
so any update would need to be processed.

I suppose you could queue updates in terms of... if there's a
holdings-bibliographic merge in progress, don't start another one as things
will explode... still seems like a potentially intensive operation.

--

Problems to consider and solve (anyone can chime in here):

1) Precedence of bibliographic-bibliographic merges
2) Merging holdings records into bibliographic records (e.g. 852 and 863 into
the bibliographic record... not 952 into bibliographic record)
3) Any local changes to a record will be erased by future downloaded updates

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list