[Koha-devel] zebraidx with 1 large MARC file rather than many small MARC files

David Cook dcook at prosentient.com.au
Mon Feb 4 07:24:52 CET 2019


To answer my own question.

 

I have a zebraidx running on 461MB (or 77000 records) and it's only using 2%
of memory on a 4GB system, so I'm thinking it is using a stream reader and
updating the shadow files on disk as it goes through the massive MARC file.

 

In that case, while it might be slow to export records to that file,
zebraidx probably does read 1 large file much faster than many small files. 

 

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St

Ultimo, NSW 2007

Australia

 

Office: 02 9212 0899

Direct: 02 8005 0595

 

From: koha-devel-bounces at lists.koha-community.org
[mailto:koha-devel-bounces at lists.koha-community.org] On Behalf Of David Cook
Sent: Monday, 4 February 2019 5:10 PM
To: 'Koha Devel' <koha-devel at lists.koha-community.org>
Cc: tomascohen at theke.io
Subject: [Koha-devel] zebraidx with 1 large MARC file rather than many small
MARC files

 

Hi all,

 

I haven't looked into it too deeply, but I was curious if Zebra would have
better performance indexing with 1 large MARC file versus many small MARC
files.

 

At the moment, we generate 1 huge MARC file and then pass that to zebraidx
as an argument. 

 

Is that something we've always done or was it done as a performance
enhancement? 

 

I haven't looked at the Zebra internals to see whether it reads the entire
file into memory and then processes it or if it parses the XML using a
stream reader. Zebraidx can also take a list of files from stdin*, but if
you had tonnes of small files that could be troublesome.

 

I suppose it doesn't matter too much as we march on to ElasticSearch, but I
figure lots of people are using Zebra still and probably will for a long
time, so perhaps worth thinking about. 

 

https://software.indexdata.com/zebra/doc/zebraidx.html

 

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St

Ultimo, NSW 2007

Australia

 

Office: 02 9212 0899

Direct: 02 8005 0595

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20190204/66a52a90/attachment.html>


More information about the Koha-devel mailing list