[Koha-devel] biblioitems.marcxml & biblioitems.marc / HUGE performance issue !

Paul Poulain paul.poulain at biblibre.com
Tue Jul 12 18:43:30 CEST 2016


Hi all,

Those days, we're working on a pretty large DB ( >1M biblio), for a 
customer that want to do many statistics on some fields.
We discovered that something "simple" like:
SELECT publicationyear, count(publicationyear) FROM biblioitems GROUP BY 
publicationyear;

was giving *no result in 10mn*.
This is a test DB, not optimized, but we were surprised by the results.
After investigating we had the idea to create a biblioitems2 table with 
the same structure *EXCEPT MARCXML and MARC fields

*launch the same SQL query : *result in 3seconds* !*
*This could be reproduced on any query (on fields without index).

I think it's because the innoDB is storing each line in one "object", 
so, even if you need only one column, you have to read everything.
In our case, that was 12GB+ of data to read.
biblioitems2 is just a few dozen MB.
(all caching values are minimum and there's no index, so not involved in 
the results)

MY CONCLUSIONS:
  * the biblioitems.marc field must be removed quickly: it's useless 
since years, and is only resulting in slowing things
  * the bilbioitems.marcxml field should be moved outside from this 
table. Something like biblio_blob, with biblionumber, biblioitemnumber 
and marcxml. When we need it, just join the tables.

I'm almost sure it would have an important impact on Koha, as 
biblioitems table is called and used "everywhere".

any opinion ?
**

-- 
Paul Poulain, Associé-gérant / co-owner
BibLibre, Services en logiciels libres pour les bibliothèques
BibLibre, Open Source software and services for libraries

-------------- section suivante --------------
Une pi�ce jointe HTML a �t� nettoy�e...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20160712/c45d8b59/attachment.html>


More information about the Koha-devel mailing list