[Koha-bugs] [Bug 10662] New: Build OAI-PMH Harvesting Client

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Tue Jul 30 09:39:02 CEST 2013


http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662

            Bug ID: 10662
           Summary: Build OAI-PMH Harvesting Client
 Change sponsored?: ---
           Product: Koha
           Version: master
          Hardware: All
                OS: All
            Status: NEW
          Severity: new feature
          Priority: P5 - low
         Component: Web services
          Assignee: koha-bugs at lists.koha-community.org
          Reporter: dcook at prosentient.com.au
        QA Contact: testopia at bugs.koha-community.org

Currently, Koha only acts as a OAI-PMH server, I propose to add a harvesting
client as well (likely using the HTTP::OAI::Harvester module), so that Koha can
ingest records from other data sources (such as digital repositories like
Dspace).

I've only started reading about it but despite initial reservations about
resumption tokens, I think the hardest part will not be with the retrieval of
records so much as the parsing of those records into MARC.

The Library of Congress does provide some crosswalks
(http://www.loc.gov/standards/marcxml/) for converting other metadata formats
into MARC21. However, the DC to MARC crosswalk (which is the obvious choice for
Dspace) does not produce records of a high quality. So, as part of this new
feature, I will also be working on a more complete DC to MARC crosswalk. Their
DCMI terms
(http://dublincore.org/documents/2012/06/14/dcmi-terms/?v=elements#type) have
been mapped reasonably well to the MARC21 LEADER 6th and 7th positions, which
improves the quality of the record and helps to produce a "best guess" 008.
I'll be looking at adding more datafields and providing better fixed field
transformation.

Of course, some OAI-PMH repositories serve MARC so this crosswalk might not
always be necessary. However, I've looked at Dspace's DC=>MARC crosswalk and
it's similar to the LoC one, so I think this will be a valuable addition (both
to Koha and to anyone wanting to transform DC to MARC21).

As I mentioned, I'm just starting out with OAI-PMH, but I imagine having Koha
as a OAI-PMH harvester might be useful in union catalogue situations where
other servers might send quality MARC records.

--

My plan:

1) Set up a script that is able to continuously harvest records from a OAI-PMH
server (likely Dspace or Koha itself for my trials)
2) Set up a database table to handle harvester configuration (such as baseurl,
sets, (possibly dates), metadata format, and any pointers to XSLTs.
3) Set up a solid (yet likely basic) DC => MARC XSLT.

If anyone has comments or advice, I'd love to hear it. Hopefully, I'll be able
to focus on this over the next little while...

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.


More information about the Koha-bugs mailing list