[Koha-devel] zebraidx with 1 large MARC file rather than many small MARC files

David Cook dcook at prosentient.com.au
Mon Feb 4 07:10:25 CET 2019


Hi all,

 

I haven't looked into it too deeply, but I was curious if Zebra would have
better performance indexing with 1 large MARC file versus many small MARC
files.

 

At the moment, we generate 1 huge MARC file and then pass that to zebraidx
as an argument. 

 

Is that something we've always done or was it done as a performance
enhancement? 

 

I haven't looked at the Zebra internals to see whether it reads the entire
file into memory and then processes it or if it parses the XML using a
stream reader. Zebraidx can also take a list of files from stdin*, but if
you had tonnes of small files that could be troublesome.

 

I suppose it doesn't matter too much as we march on to ElasticSearch, but I
figure lots of people are using Zebra still and probably will for a long
time, so perhaps worth thinking about. 

 

https://software.indexdata.com/zebra/doc/zebraidx.html

 

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St

Ultimo, NSW 2007

Australia

 

Office: 02 9212 0899

Direct: 02 8005 0595

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20190204/8a83bac8/attachment.html>


More information about the Koha-devel mailing list