[Koha-bugs] [Bug 10662] New: Build OAI-PMH Harvesting Client
bugzilla-daemon at bugs.koha-community.org
bugzilla-daemon at bugs.koha-community.org
Tue Jul 30 09:39:02 CEST 2013
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662
Bug ID: 10662
Summary: Build OAI-PMH Harvesting Client
Change sponsored?: ---
Product: Koha
Version: master
Hardware: All
OS: All
Status: NEW
Severity: new feature
Priority: P5 - low
Component: Web services
Assignee: koha-bugs at lists.koha-community.org
Reporter: dcook at prosentient.com.au
QA Contact: testopia at bugs.koha-community.org
Currently, Koha only acts as a OAI-PMH server, I propose to add a harvesting
client as well (likely using the HTTP::OAI::Harvester module), so that Koha can
ingest records from other data sources (such as digital repositories like
Dspace).
I've only started reading about it but despite initial reservations about
resumption tokens, I think the hardest part will not be with the retrieval of
records so much as the parsing of those records into MARC.
The Library of Congress does provide some crosswalks
(http://www.loc.gov/standards/marcxml/) for converting other metadata formats
into MARC21. However, the DC to MARC crosswalk (which is the obvious choice for
Dspace) does not produce records of a high quality. So, as part of this new
feature, I will also be working on a more complete DC to MARC crosswalk. Their
DCMI terms
(http://dublincore.org/documents/2012/06/14/dcmi-terms/?v=elements#type) have
been mapped reasonably well to the MARC21 LEADER 6th and 7th positions, which
improves the quality of the record and helps to produce a "best guess" 008.
I'll be looking at adding more datafields and providing better fixed field
transformation.
Of course, some OAI-PMH repositories serve MARC so this crosswalk might not
always be necessary. However, I've looked at Dspace's DC=>MARC crosswalk and
it's similar to the LoC one, so I think this will be a valuable addition (both
to Koha and to anyone wanting to transform DC to MARC21).
As I mentioned, I'm just starting out with OAI-PMH, but I imagine having Koha
as a OAI-PMH harvester might be useful in union catalogue situations where
other servers might send quality MARC records.
--
My plan:
1) Set up a script that is able to continuously harvest records from a OAI-PMH
server (likely Dspace or Koha itself for my trials)
2) Set up a database table to handle harvester configuration (such as baseurl,
sets, (possibly dates), metadata format, and any pointers to XSLTs.
3) Set up a solid (yet likely basic) DC => MARC XSLT.
If anyone has comments or advice, I'd love to hear it. Hopefully, I'll be able
to focus on this over the next little while...
--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
More information about the Koha-bugs
mailing list