[Koha-bugs] [Bug 10662] Build OAI-PMH Harvesting Client

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Tue Sep 8 05:13:32 CEST 2015


http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662

--- Comment #16 from David Cook <dcook at prosentient.com.au> ---
Created attachment 42446
  -->
http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=42446&action=edit
Bug 10662 - Build OAI-PMH Harvesting Client

This patch set adds an OAI-PMH harvesting client to Koha.

It provides a user interface (UI) for defining external
servers from which to harvest records using the OAI-PMH
protocol.

After it downloads records, it checks the harvest database
to see if it needs to add a new record, update an
existing record, or delete a record in Koha.

_TEST PLAN_

1) Apply all patches
2) Run updatedatabase.pl (to apply the atomic update)

3) Go to Administration > OAI-PMH servers
4) Click "New OAI-PMH server target"
5) At a minimum, include a valid "Base URL" and a valid "Metadata prefix".
6) Click "Test HTTP and OAI-PMH parameters"
7) If successful, continue with this plan. If unsuccessful, address
the warning messages displayed in red before testing the parameters
again.

8) At this point, you might want to choose a preferred granularity.
All OAI-PMH servers must support YYYY-MM-DD according to the spec,
but in practice this isn't always the case, so you may need to choose
a more particular granularity (note that this support isn't tested
using the "Test" button).
9) You may also want to choose a "From" and "Until" range, at least
for the purposes of testing, so that you don't accidentally try
downloading thousands or millions of records. (You may also
want to download by "Set".)

10) You must set the "Active" parameter to "Active" from "Inactive"
for the harvester to work on this server target.
11) Optionally, you may provide a path to a XSLT to transform
the incoming data. There is a parameter called "identifier"
which is passed to the XSLT engine. This contains the unique
OAI-PMH identifier for a record. You may wish to add this to the
MARC, especially for the sake of provenance. (You may also
want to strip 952, 942, and 999 fields, as well as $9 subfields
from incoming records. You may also try the magic "default" keyword
here which uses a XSLT I've already written and linked in the
backend.)

12) Choose the MARC framework you would like to use (although
Default is fine as well).

13) Optionally, you may wish to include a "Original system field".
At this time, this has no real purpose. However, in the future,
it may be used for linking downloaded holdings records with
their original parent record. (e.g. the 004 of a holdings record
would link to the 001 of the bibliographic record). This field
uses the format of 001 or 999$c with the dollar sign as a subfield
delimiter.

14) Click Save
15) You will now see a table containing your entry; click "View".
16) All the numbers on the following screen should be 0.

--

17) Set your environmental variables for KOHA_CONF and PERL5LIB
18) Run "perl /misc/cronjobs/oai/oai_harvester.pl -d -v" to download
your records (NOTE: This downloader will run as long as it needs to,
so try to only download a few records. Ctrl+C will stop the harvest
if it gets out of control.)
19) Revisit the web app as per step #15
20) It should now say "Harvested records waiting to be imported: X"
with X being higher than 0.
21) Run "perl /misc/cronjobs/oai/oai_harvester.pl -i -v" to import
these records into Koha.
22) The terminal output should indicate the result of the import.
This should also be reflected by the webapp as per step #15
(e.g. "Koha records created from harvested records: X").

--

Now, there are a few different scenarios to try:

If you control the OAI-PMH repository, try editing a record you've
downloaded, and try downloading records again (it might be necessary to
change your "From" entry as this should be auto-updated after each
harvest), and seeing if your Koha record is updated. If your repository
also supports deleted records, try deleting a record that you've
already imported into Koha. Koha should get a deletion notice
and delete the record from Koha (unless it has items attached).

If you delete a record from Koha, but that record still exists
upstream, you'll still download updates for that record,
but an error will be generated when trying to import into Koha.
Each record in this "error" state will be recorded in the "View"
section of the UI as "Harvested records in an error state: X".
(In the future, I might make it so that the record gets re-added,
or add more configuration options to control this behaviour.)

If you want to reset the respository harvest (ie delete all
your existing harvested records and re-harvest a repository),
click "Reset repository harvest" in the "View" screen of
the OAI-PMH server target. If errors are encountered while
deleting an existing harvest, it should display hyperlinks
to the problem records for manual intervention.

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list