[Koha-devel] zebraidx with 1 large MARC file rather than many small MARC files

Mon Feb 4 07:10:25 CET 2019

Hi all,

I haven't looked into it too deeply, but I was curious if Zebra would have
better performance indexing with 1 large MARC file versus many small MARC
files.

At the moment, we generate 1 huge MARC file and then pass that to zebraidx
as an argument. 

Is that something we've always done or was it done as a performance
enhancement? 

I haven't looked at the Zebra internals to see whether it reads the entire
file into memory and then processes it or if it parses the XML using a
stream reader. Zebraidx can also take a list of files from stdin*, but if
you had tonnes of small files that could be troublesome.

I suppose it doesn't matter too much as we march on to ElasticSearch, but I
figure lots of people are using Zebra still and probably will for a long
time, so perhaps worth thinking about. 

https://software.indexdata.com/zebra/doc/zebraidx.html

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St

Ultimo, NSW 2007

Australia

Office: 02 9212 0899

Direct: 02 8005 0595

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20190204/8a83bac8/attachment.html>