[Koha-devel] zebraidx with 1 large MARC file rather than many small MARC files
David Cook
dcook at prosentient.com.au
Mon Feb 4 07:24:52 CET 2019
To answer my own question.
I have a zebraidx running on 461MB (or 77000 records) and it's only using 2%
of memory on a 4GB system, so I'm thinking it is using a stream reader and
updating the shadow files on disk as it goes through the massive MARC file.
In that case, while it might be slow to export records to that file,
zebraidx probably does read 1 large file much faster than many small files.
David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St
Ultimo, NSW 2007
Australia
Office: 02 9212 0899
Direct: 02 8005 0595
From: koha-devel-bounces at lists.koha-community.org
[mailto:koha-devel-bounces at lists.koha-community.org] On Behalf Of David Cook
Sent: Monday, 4 February 2019 5:10 PM
To: 'Koha Devel' <koha-devel at lists.koha-community.org>
Cc: tomascohen at theke.io
Subject: [Koha-devel] zebraidx with 1 large MARC file rather than many small
MARC files
Hi all,
I haven't looked into it too deeply, but I was curious if Zebra would have
better performance indexing with 1 large MARC file versus many small MARC
files.
At the moment, we generate 1 huge MARC file and then pass that to zebraidx
as an argument.
Is that something we've always done or was it done as a performance
enhancement?
I haven't looked at the Zebra internals to see whether it reads the entire
file into memory and then processes it or if it parses the XML using a
stream reader. Zebraidx can also take a list of files from stdin*, but if
you had tonnes of small files that could be troublesome.
I suppose it doesn't matter too much as we march on to ElasticSearch, but I
figure lots of people are using Zebra still and probably will for a long
time, so perhaps worth thinking about.
https://software.indexdata.com/zebra/doc/zebraidx.html
David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St
Ultimo, NSW 2007
Australia
Office: 02 9212 0899
Direct: 02 8005 0595
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20190204/66a52a90/attachment.html>
More information about the Koha-devel
mailing list