[Koha-bugs] [Bug 10662] Build OAI-PMH Harvesting Client

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Mon Nov 23 02:11:05 CET 2015


http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662

--- Comment #41 from David Cook <dcook at prosentient.com.au> ---
(In reply to Katrin Fischer from comment #33)
> I was told recently that 2-3 seconds is quite standard for OAI-PMH harvests.
> 
> I think a problem could occur if Zebra is involved in matching as you have
> to make sure the indexes have caught up before you can reliably match. Say a
> record is changed at the source twice in a very short timeframe... or added
> and then changed again, included in 2 harvests... but not yet indexed when
> the second runs, etc.

I agree once again with Katrin. I think I've said before (either here or via
email) that using Zebra for matching can be very unreliable. 

Currently, I use the unique OAI-PMH identifiers to handle all harvested
records, and that's quite robust, since that identifier should be persistent.
However, that obviously doesn't help with matching OAI-PMH harvested records
against local records created via other methods.

In the short-term, perhaps merging bibliographic records would have to occur
manually. Or maybe a deduplication tool could be created to semi-automate that
task... although I think that tool would have to prevent any deletion of
OAI-PMH harvested records.

Actually, this hearkens back to my previous comment. It would be good if each
record had a simple way of identifying its origin. So you couldn't delete a
record obtained via OAI-PMH unless its parent repository was deleted from Koha
or unless you used a OAI-PMH management tool to delete records for that
repository. 

I think providing this "source" or "origin" would need to be done consistently
or rather... extensibly. I wouldn't want it to be OAI-PMH specific as that
would be short-sighted. 

At the moment, everything that goes through svc/import_bib uses a webservices
import_batch... but that's not very unique. It would be interesting to have
unique identifiers for import sources. So you might use the svc/import_bib with
the connexion_import_daemon.pl, or with MARCEdit, or your home-grown script, or
whatever. It would be interesting to distinguish those separately... and maybe
prevent writes/deletions for records that are entered via
connexion_import_daemon.pl and home-grown script XYZ, while leaving ones
imported via MARCEdit to be managed however since you just exported some
original records and re-imported them via MARCEdit after making some changes.

One way of doing that would actually be to use developer keys... so a developer
would need to get a key from Koha before using the web service and then the
Koha sysadmin could handle the interaction between that service and Koha's
internals using that key (e.g. if records are imported via Webservice A,
prevent Koha users from doing anything with them).

I suppose that's a bit tougher to do with OAI-PMH... but not necessarily. When
a new OAI-PMH repository is added, the system could generate a key for it, and
use that key for handling the permissions for Koha users...

I think that element of the discussion would relate a lot to
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=14957...

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list