[Koha-bugs] [Bug 10662] Build OAI-PMH Harvesting Client

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Mon Nov 30 00:22:03 CET 2015


http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662

--- Comment #46 from David Cook <dcook at prosentient.com.au> ---
(In reply to Leif Andersson from comment #45)
> Have you ever considered "exporting" some of the marc fields to a separate
> mysql table? What I am thinking of is those fields that most likely will be
> used for duplicate detection: 001, 003, 020, 022, 035
> We could than do the necessary matching without involving zebra. In most
> cases I'd imagine.
> 

Yes, I've thought about this a bit. The 020 and 022 can already be found in
biblioitems.isbn and biblioitems.issn respectively. Unfortunately, they store
multiple values in the same field which is suboptimal in this case although not
impossible to use...

For a while, I've been thinking that it would be nice to store the 001
somewhere and perhaps the 035. 

I think the problem with that is we're in a place where we actually want to be
moving away from MARC... not entrenching it further. So I don't think we should
really add to the biblio or biblioitems tables. 

Of course, there could be a way around that by making a generic "metadata"
table. Something like...

metadata.id, metadata.record_id, metadata.scheme, metadata.qualifier,
metadata.value. 

So that would look like:

1, 1, marc21, 001, 123456789

I think that's actually very similar to what they do in DSpace, and I've seen
other library systems store their MARC records in a similar way.

> To be useful this would have to be applied on all imports, not only OAI.
> 

Well, it would actually need to be applied to _all records_ rather than _all
imports_. You'd need that data filled for all records if you were going to
match properly.

> Such a table could also be used to save the information of the origin of a
> record - if that is desired.

Actually, that's a good point. We could do something like:

metadata.id, metadata.record_id, metadata.scheme, metadata.qualifier,
metadata.value. 
2, 1, koha, record_origin, oai-pmh

Actually, in retrospect, it would be wise to add another field like
"metadata.record_type" for biblio, authority, and item.

--

I think there are some potential obstacles to this approach though:

1) Ideally, it would be discussed with the Koha community and Release Manager
to see if this table could be used by other existing parts of Koha and new
features
2) It would need to be added to the existing record Add/Mod/Del functions. This
isn't necessarily a huge obstacle...
3) The table would need to be populated initially... for databases with
millions of records, this would be a very time-consuming process. Since it
would be an intensive process, I think perhaps it would need to be run at the
discretion of a system administrator. I think the "touch_all_biblios.pl" script
would actually take care of this process, since we'd be updating the
Add/Mod/Del functions, so the "metadata" table would be populated by running
that script. 
4) How to decide which fields should be "exported" into this table? While we
could provide configuration for this, configuration changes would require
"touch_all_biblios.pl" to be run again for the "metadata" fields to be
generated correctly. Perhaps having a backend configuration file would be the
best in this case, as the person editing it would also be someone who could
re-generate the "metadata" table.

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list