[Koha-bugs] [Bug 21872] Elasticsearch indexing faster by making it multi-threaded

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Fri Jan 25 14:00:29 CET 2019


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=21872

david holoshka <david.holoshka at ub.lu.se> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |david.holoshka at ub.lu.se

--- Comment #35 from david holoshka <david.holoshka at ub.lu.se> ---
We were force to rewrite rebuild_elastic_search.pl as it just died after a
couple days never finishing to index our 2.4 million bibliographic records. Our
version forks a copy of the process to each machine core using biblio_metadata
based limits precalculated by the parent process (this has been upgraded since
I sent you a copy of the code, David to make sure each core gets the same
number of records to index). My old algorithm didn't distribute the load well
as the metadata ids gaps were create by biblio updates with time.  With 8 cores
the indexing completes in 50 minutes with elastic search running on the same
virtual machine. We speed up the process a great deal by accessing the metadata
table directly instead of through the iterator.  The only draw back is memory
usage due to needing to put the 952 item data (coincidentally also 2.4 million
items) in hashes.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.


More information about the Koha-bugs mailing list