[Koha-zebra] A few Zebra Questions

Wed Jan 4 23:54:33 CET 2006

On 1/4/06, Sebastian Hammer <quinn at indexdata.com> wrote:

[big ol' snip]

> >Another question that immediately occurs is: _what_ speed issues?
> >Have you actually seen any?  Do you have any numbers?
> >
> >
> I'd like to hear the answer to this too. But my sense is that updating a
> single record in a multimillion record database does take some
> significant period of a time -- much more than updating a single row in
> an RDBMS, for sure. It matters if you're scaling to a major library with
> multiple circulation desks.

Warning: rant follows. :)

This is exactly the concern, unless I misunderstand the OP.  With a
centralized system running, say, 250+ libs with more than 1,500 circ
and reference desk clients it would be one of the primary speed
related concerns.

I believe the desire here is for Koha to both scale to large
installations, and also offer advanced search/filter options.  Keeping
the item status as close to the item and record identifiers obviously
increases the flexibility of searches and filters, but it imposes a
much greater maintenance cost.  So with the knowledge that it would be
slower than in an RDBMS, the question becomes "how much slower, and
where is the tipping point?".

Part of that depends on what the most important filter would be. 
IMHO, the most important status/state related item information would
be those variables that affect item visibility to the patron, so I'll
make an example of that.

If you don't want items that are LOST or MISSING, or records that only
have items in those states, to show up in the OPAC (because the
patron, by definition, cannot use them), then that can be condensed
into a "patron visibility" flag on the record.  It may be worth the
cost of updating that flag when it is calculated to have changed, and
not otherwise.  This gives you the functionality from the specific use
case above, but it limits the flexibility of the system.  Staff don't
get to search directly for items that are LOST or MISSING, just
records that wouldn't show up because the constituent items are all in
that state.

The thing to watch out for when denormalizing data to increase speed
is that you'll do it over and over again.  Using the example above,
there are probably 20 flags one could invent to solve specific issues
like that, but then you've got to check, calculate, and possibly
update the value of all of those flags on every item update.  At some
point the denormaliziation costs too much in the application layer,
and you might as well just move the raw data into the records,
updating at every change.

So, the first step is to probably to design some use cases.  If they
seem to be comprehensive and the required data are easily identified,
then tests can be done and a decision can be made as to whether any of
this is worth the update costs inside zebra, and which plan is
"better".

>
> --Sebastian
>
> > _/|_   ___________________________________________________________________
> >/o ) \/  Mike Taylor  <mike at miketaylor.org.uk>  http://www.miketaylor.org.uk
> >)_v__/\  "Press any key to continue or any other key to quit" -- Jeff
> >        Covey.
> >
> >
> >
> >
>
> --
> Sebastian Hammer, Index Data
> quinn at indexdata.com   www.indexdata.com
> Ph: (603) 209-6853
>
>
>
>
> _______________________________________________
> Koha-zebra mailing list
> Koha-zebra at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/koha-zebra
>

--
Mike Rylander
mrylander at gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org