[Koha-bugs] [Bug 10662] Build OAI-PMH Harvesting Client

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Tue Nov 29 05:54:28 CET 2016


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662

--- Comment #106 from David Cook <dcook at prosentient.com.au> ---
(In reply to Galen Charlton from comment #1)
> (In reply to David Cook from comment #0)
> > Currently, Koha only acts as a OAI-PMH server, I propose to add a harvesting
> > client as well (likely using the HTTP::OAI::Harvester module), so that Koha
> > can ingest records from other data sources (such as digital repositories
> > like Dspace).
> 
> Interesting idea.
> 
> > I've only started reading about it but despite initial reservations about
> > resumption tokens, I think the hardest part will not be with the retrieval
> > of records so much as the parsing of those records into MARC.
> 
> This may be less of a problem in the long run with my plans to allow Koha to
> support multiple metadata formats (although even once that's available, you
> may still want the harvester to be able to convert the source metadata into
> something else).
>  
> One thing I'd suggest is that the harvester keep a copy of the original
> metadata record in a database table; that would be more flexible than
> immediately converting it to MARC and discarding the source data.

Recently, I've been wondering if we really need an API just for OAI-PMH
records, but that thought always brings me back to Galen's comment from 2013. 

I figure it's worthwhile having this API because it stores the entire OAI-PMH
container record. If your metadata transform is bad, you won't get a MARCXML
record in Koha, but you'll be able to re-try the transformation since the
OAI-PMH container record is stored in the database. Plus, it shows an import
history for records over time using the OAI-PMH identifier.

I would like to link OAI-PMH identifiers more closely to MARCXML records, but
it's a bit problematic. At the moment, I store it in the 024$a, but that's a
fairly generic field. It would be nice to store it in the database, but then it
might be lost during a record merge or a changing of biblionumbers in some
other way. In the long-run, I'm planning to store the OAI-PMH identifier in
RDF, but that's still in the future yet.

Of course, OAI-PMH identifiers aren't foolproof either. In theory, they should
be unique, but there's no guarantee. 

Anyway, I think it can't hurt to store OAI-PMH records in Koha. There's a
cleanup_database script which will clear out old records, so the database table
doesn't grow too large, so you could lose the original OAI-PMH record that way,
but Koha should store data for a period long enough to let you fix
transformation problems and things like that.

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list