[Koha-zebra] Behind the Scenes Updating

Mike Taylor mike at miketaylor.org.uk
Wed Jan 4 21:34:00 CET 2006


> Date: Wed, 04 Jan 2006 14:13:48 -0500
> From: Sebastian Hammer <quinn at indexdata.com>
>
>>>> So for _new_ cataloging, we're going to have to generate the
>>>> 090$c field for every MARC record we plan to import, then we'll
>>>> 'update' the Zebra index, and if we're using shadow registers,
>>>> this process will be fault-tolerant.
>>> 
>>> Yep.
>> 
>> Pedantry note: the operation that needs to be done to resync with
>> the shadow registers is called "commit", not "update".  "Update" is
>> what you do to add or change records, and "commit" is what you do
>> to make those changes permanent.
> 
> Uh, pedantry-pedantry note. You are suggesting that 'update' works
> on records, whereas commit does something else.

Uh-uh, false-accusation note.  I am suggesting no such thing.

> In fact, 'update' modifies both the record and the index files.

Indeed.  But if you're using shadow files, those modifications (to
both the record store and the index) are not made permanent until you
do a "commit".  Which is what I said, isn't it?  Or am I still
misunderstanding something?

> Date: Wed, 04 Jan 2006 14:10:36 -0500
> From: Sebastian Hammer <quinn at indexdata.com>
> 
>>> One somewhat obvious way to approach this would be using
>>> OAI-PMH.. the LoC is presently contemplating awarding us a little
>>> money to support an OAI server function in Zebra.
>> 
>> 	rec.lastModificationDate >= 2005-12-15
> 
> I'm not always hip to the latest changes to Zebra, but last I
> looked, it didn't support searching by record update timestamp.

Nor will it magically start to do so if we implement OAI!  :-)

Anyway, this is very easily solved at the application level.

> The other thing missing is the ability to retrieve information about
> records that have been deleted. At present, these just disappear
> without a trace.. there needs to be some mechanism to retrieve
> information (at least a sysno) about records deleted since a given
> date.

True.

>> Another question that immediately occurs is: _what_ speed issues?
>> Have you actually seen any?  Do you have any numbers?
> 
> I'd like to hear the answer to this too. But my sense is that
> updating a single record in a multimillion record database does take
> some significant period of a time -- much more than updating a
> single row in an RDBMS, for sure. It matters if you're scaling to a
> major library with multiple circulation desks.

Interesting.  For fixed-size fields such as timestamps, there are some
obvious hacks that can make record updating super-fast.  (We used some
of them in Index+).  If Zebra doesn't already do this, it ought to.
Maybe it's time for another sponsor-hunt ...

 _/|_	 ___________________________________________________________________
/o ) \/  Mike Taylor  <mike at miketaylor.org.uk>  http://www.miketaylor.org.uk
)_v__/\  Live fast, Die old.







More information about the Koha-zebra mailing list