[Koha-devel] zebraidx with 1 large MARC file rather than many small MARC files
David Cook
dcook at prosentient.com.au
Mon Feb 4 07:10:25 CET 2019
Hi all,
I haven't looked into it too deeply, but I was curious if Zebra would have
better performance indexing with 1 large MARC file versus many small MARC
files.
At the moment, we generate 1 huge MARC file and then pass that to zebraidx
as an argument.
Is that something we've always done or was it done as a performance
enhancement?
I haven't looked at the Zebra internals to see whether it reads the entire
file into memory and then processes it or if it parses the XML using a
stream reader. Zebraidx can also take a list of files from stdin*, but if
you had tonnes of small files that could be troublesome.
I suppose it doesn't matter too much as we march on to ElasticSearch, but I
figure lots of people are using Zebra still and probably will for a long
time, so perhaps worth thinking about.
https://software.indexdata.com/zebra/doc/zebraidx.html
David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St
Ultimo, NSW 2007
Australia
Office: 02 9212 0899
Direct: 02 8005 0595
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20190204/8a83bac8/attachment.html>
More information about the Koha-devel
mailing list