[Koha-devel] zebraidx with 1 large MARC file rather than many small MARC files
David Cook
dcook at prosentient.com.au
Wed Feb 27 06:29:49 CET 2019
That's interesting. I do have some large Koha instances that have run into
that space problem I think.
David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St
Ultimo, NSW 2007
Australia
Office: 02 9212 0899
Direct: 02 8005 0595
-----Original Message-----
From: koha-devel-bounces at lists.koha-community.org
[mailto:koha-devel-bounces at lists.koha-community.org] On Behalf Of Fridolin
SOMERS
Sent: Tuesday, 26 February 2019 7:36 PM
To: koha-devel at lists.koha-community.org
Subject: Re: [Koha-devel] zebraidx with 1 large MARC file rather than many
small MARC files
Hi,
The problem with redinxing full catalogue is the huge XML files that will be
generated in /tmp/.
Some servers dont have space enought.
So we at Biblibre use a shell script to reindex step by steps :
https://git.biblibre.com/biblibre/tools/src/branch/master/zebra/rebuild_full
.sh
Ah if it crashes you may restart from last good step ;)
Best regards,
Le 04/02/2019 à 07:24, David Cook a écrit :
> To answer my own question.
>
>
>
> I have a zebraidx running on 461MB (or 77000 records) and it's only
> using 2% of memory on a 4GB system, so I'm thinking it is using a
> stream reader and updating the shadow files on disk as it goes through the
massive MARC file.
>
>
>
> In that case, while it might be slow to export records to that file,
> zebraidx probably does read 1 large file much faster than many small
files.
>
>
>
> David Cook
>
> Systems Librarian
>
> Prosentient Systems
>
> 72/330 Wattle St
>
> Ultimo, NSW 2007
>
> Australia
>
>
>
> Office: 02 9212 0899
>
> Direct: 02 8005 0595
>
>
>
> From: koha-devel-bounces at lists.koha-community.org
> [mailto:koha-devel-bounces at lists.koha-community.org] On Behalf Of
> David Cook
> Sent: Monday, 4 February 2019 5:10 PM
> To: 'Koha Devel' <koha-devel at lists.koha-community.org>
> Cc: tomascohen at theke.io
> Subject: [Koha-devel] zebraidx with 1 large MARC file rather than many
> small MARC files
>
>
>
> Hi all,
>
>
>
> I haven't looked into it too deeply, but I was curious if Zebra would
> have better performance indexing with 1 large MARC file versus many
> small MARC files.
>
>
>
> At the moment, we generate 1 huge MARC file and then pass that to
> zebraidx as an argument.
>
>
>
> Is that something we've always done or was it done as a performance
> enhancement?
>
>
>
> I haven't looked at the Zebra internals to see whether it reads the
> entire file into memory and then processes it or if it parses the XML
> using a stream reader. Zebraidx can also take a list of files from
> stdin*, but if you had tonnes of small files that could be troublesome.
>
>
>
> I suppose it doesn't matter too much as we march on to ElasticSearch,
> but I figure lots of people are using Zebra still and probably will
> for a long time, so perhaps worth thinking about.
>
>
>
> https://software.indexdata.com/zebra/doc/zebraidx.html
>
>
>
> David Cook
>
> Systems Librarian
>
> Prosentient Systems
>
> 72/330 Wattle St
>
> Ultimo, NSW 2007
>
> Australia
>
>
>
> Office: 02 9212 0899
>
> Direct: 02 8005 0595
>
>
>
>
>
> _______________________________________________
> Koha-devel mailing list
> Koha-devel at lists.koha-community.org
> http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> website : http://www.koha-community.org/ git :
> http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
>
--
Fridolin SOMERS <fridolin.somers at biblibre.com>
BibLibre, France - software and system maintainer
_______________________________________________
Koha-devel mailing list
Koha-devel at lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
More information about the Koha-devel
mailing list