[Koha-bugs] [Bug 10662] Build OAI-PMH Harvesting Client

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Fri Dec 4 01:52:22 CET 2015


http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662

--- Comment #56 from David Cook <dcook at prosentient.com.au> ---
Just realized that I forgot to respond to this comment...

(In reply to Leif Andersson from comment #49)
> (In reply to David Cook from comment #47)
> > Leif and Andreas:
> > 
> > If I understand correctly, your main use case for matching would be to make
> > sure that records previously imported from the union catalogue aren't
> > duplicated when you start using OAI-PMH, yes?
> > 
> > In that case, would matching on the 001 be suitable? 
> > 
> 
> If we are only importing from one source, 001 would be fine.
> But if we will be using several sources for our imports, then relying on 001
> would sooner or later result in a "false matcning" where we end up having
> one record overwritten by a totally different one.
> 

Agreed. I think that would be a very real risk. 

> How reliable would it be to add in 003?
> In MARC21 003 is the alphanumeric "MARC code for the organization whose
> control number is contained in field 001".
> I don't know how this fits UNIMARC, though.
> 

It should be trivial to merge the 001 and 003 together to form a 035 like
"(OCoLC)814782" (http://www.loc.gov/marc/bibliographic/bd035.html). That would
certainly help eliminate that risk of "false matching" mentioned above.

However, in the LIBRIS data that I've seen, I haven't seen any examples of a
003. 

I'm not sure how this fits UNIMARC either. Hopefully some of the French people
lurking on this bug can provide some insight there.

> David suggested in a mail (to Koha devel list) to move field 001 to 035.
> Or even 001 + 003 to 035
> In doing so some refinements could be done to this matching point (e.g.
> normalization, if 003 is empty then add what we know about the exporting
> catalog etc)

Hmm, yeah, that would probably be achievable. However, that wouldn't really
help too much, because your local Koha records will be missing that
003/additional exporting catalog information. So the matching will still fail.

In terms of your local Koha catalogue, you could add a 003 to all records
before starting to use the OAI-PMH harvester. Then either update records in
LIBRIS to have a 003 as well, OR have the harvester inject a 003 into incoming
records which will match the 003 in your local Koha catalogue. That's probably
the way to go...

Actually, let me think about that for a second...

What do your existing Koha records have for 001 and 003? If they've been
previously imported from LIBRIS, do they have the LIBRIS 001 in the Koha 001
field?

--

In terms of matching, we basically need to make sure that incoming data can
map/match to existing data. If Koha records have 001 but no 003, then we have
to be able to match using just the 001 from the incoming record. If Koha
records have 001 and 003, then we need to match the 001 and 003 of the incoming
record against those... 

Failing that, we need to match a 035 on the incoming record with a 035 on an
existing Koha record. I imagine that none of the records have a 035 field.
Perhaps it would be worthwhile to create one of those as well as I described
above... that would probably be best (at least for bibliographic records and
authority records).

As I noted in my other comment, matching item records is going to be tricky, as
we won't have the 035 mechanism available, unless we cheat a bit and put it in
the 952$i (inventory number) or something like that... 

--

I admit that I'm starting to think a bit about how to add support for MARC
holdings into Koha. While long-term we do want to get rid of MARC, I wonder if
it could be useful having a "holdings" database in Zebra as well. In terms of
library systems, I imagine there will always be a separation of abstract
entities and print/digital holdings. 

Right now, we're really limited using the 952 field for items in Koha... but we
do it that way instead of supporting MFHD because there was no other way of
searching Zebra using both "bibliographic" and "holdings" data at the same time
if that data was in separate records.

I suppose that brings me back to the idea that Koha should really have an
intermediary extensible metadata format which is indexed. MARC bibliographic
and MARC holdings records could be held separately and then processed into a
single intermediary record which is indexed and used for search. Display... we
could either use the intermediary record or use the internal system numbers to
fetch the original MARC metadata for display. 

Of course, that would require a significant and separate development effort to
achieve...

Plus... even if we could match a MARC holdings record to a MARC holdings
record... each holdings record can have X items specified within it... so you
still need a unique identifier at the item-level in order to do full matching.

--

An added problem with using OAI-PMH and items is that items can be on loan or
otherwise in a "process" which cannot be affected by changes upstream at the
OAI-PMH server. 

Actually, now that I think about it, there is so much data stored in the 952
item record that cannot be overwritten by an upstream change... 

What is the ideal scenario for harvesting MFHD records via OAI-PMH?

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list