[Koha-devel] Proposed "metadata" table for Koha

Tue Dec 8 19:41:25 CET 2015

2015-12-01 1:44 GMT-03:00 David Cook <dcook at prosentient.com.au>:
>
> My main concern about Zebra is with it not being fast enough. Tomas
mentioned that Zebra runs updates every 5 seconds, but it looks to me like
rebuild_zebra.pl (via /etc/cron.d/koha-common) is only run every 5 minutes
on a Debian package install. At least that was the case in Koha 3.20.
That’s a huge gap when it comes to importing records. Say you import record
A at 5:01pm… and then you try to import record A again at 5:03 using a
matching rule. The update from 5:01pm hasn’t been processed yet, so you
wind up with 2 copies of record A in your Koha catalogue.

You are right that the default setup is setting a cronjob to check the
queue every 5 minutes. I planned to change that default behaviour to set
USE_INDEXER_DAEMON=yes in /etc/default/koha-common and have the cron line
commented or conditional to USE_INDEXER_DAEMON=no. But I got distracted the
last couple weeks before the release and forgot to post a patch for that.
Anyway, rebuild_zebra.pl -daemon should be run by default.

> We run openSUSE and we define our own Zebra indexer, which does run every
5 seconds. I haven’t stress tested it yet, but 5 seconds might be a bit
long even under ideal circumstances, if an import job is running every 2-3
seconds. Sure, that 2-3 seconds might be a bit optimistic… maybe it will
also be every 5 seconds. But what happens if the Zebra queue backs up?
Someone runs “touch_all_biblios.pl” or something like that and fills up the
zebra queue, while you’re importing records. Zebra is going to be out of
date.

True

> There needs to be a source of truth, and that’s the metadata record in
MySQL. Zebra is an indexed cache, and while usually it doesn’t matter too
much if that cache is a little bit stale, it can matter when you’re
importing records.

I agree the importing step should rely on the source of truth.

> Another scenario… what happens if Zebra is down? You’re going to get
duplicate records because the matcher won’t work. That said, we could
mitigate that by double-checking that Zebra is actually alive
programmatically before commencing an import. There could be other
heuristics as well… like not running an import (which uses matching rules)
unless the zebraqueue only has X number of waiting updates. Ideally, it
would be 0, but that’s unlikely in large systems. It also is impossible if
you’re importing at a rate of more than one every 5 seconds (which would be
absurdly slow).

I wouldn't create a workaround for Zebra being down only for being able to
match existing records... I would just print a red box saying the tool is
not available.

> I am curious as to whether the default Zebra update time for Debian
packages is 5 minutes or 5 seconds. While it doesn’t affect me too much as
I don’t use Debian, it matters for regular users of Koha.

Ok, i'll provide the mentioned patch :-P

--
Tomás Cohen Arazi
Theke Solutions (http://theke.io)
✆ +54 9351 3513384
GPG: B76C 6E7C 2D80 551A C765  E225 0A27 2EA1 B2F3 C15F
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20151208/dcf52c90/attachment.html>