[Koha-devel] biblioitems.marcxml & biblioitems.marc / HUGE performance issue !

David Cook dcook at prosentient.com.au
Wed Jul 13 03:51:45 CEST 2016


+1 from me too

 

It’ll hopefully go a long way to moving away from MARC in the long-term too…

 

David Cook

Systems Librarian

 

Prosentient Systems

72/330 Wattle St

Ultimo, NSW 2007

 

Office: 02 9212 0899

Direct: 02 8005 0595

 

From: koha-devel-bounces at lists.koha-community.org [mailto:koha-devel-bounces at lists.koha-community.org] On Behalf Of Chris Cormack
Sent: Wednesday, 13 July 2016 4:43 AM
To: koha-devel at lists.koha-community.org
Subject: Re: [Koha-devel] biblioitems.marcxml & biblioitems.marc / HUGE performance issue !

 

+1 from me

On 13 July 2016 5:11:36 AM NZST, Tomas Cohen Arazi <tomascohen at gmail.com <mailto:tomascohen at gmail.com> > wrote:

I agree 100%, but I'd go for a metadata_record table with schema, (id), biblionumber, format and metadata columns to start supporting more and more schemas. Example:

 

| id | format     | schema | metadata

| 1  | marcxml | marc21  | ... 

| 2  | usmarc   | unimarc | ...

| 3  | mij          | marc21  | ...

 

pretty much like we do with Koha::MetadataRecord actually :-D

 

Nice catch, Paul!

 

 

 

El mar., 12 jul. 2016 a las 13:43, Paul Poulain (<paul.poulain at biblibre.com <mailto:paul.poulain at biblibre.com> >) escribió:

Hi all,

Those days, we're working on a pretty large DB ( >1M biblio), for a customer that want to do many statistics on some fields.
We discovered that something "simple" like:
SELECT publicationyear, count(publicationyear) FROM biblioitems GROUP BY publicationyear;

was giving no result in 10mn.
This is a test DB, not optimized, but we were surprised by the results.
After investigating we had the idea to create a biblioitems2 table with the same structure EXCEPT MARCXML and MARC fields

launch the same SQL query : result in 3seconds !
This could be reproduced on any query (on fields without index).

I think it's because the innoDB is storing each line in one "object", so, even if you need only one column, you have to read everything.
In our case, that was 12GB+ of data to read.
biblioitems2 is just a few dozen MB.
(all caching values are minimum and there's no index, so not involved in the results)

MY CONCLUSIONS:
 * the biblioitems.marc field must be removed quickly: it's useless since years, and is only resulting in slowing things
 * the bilbioitems.marcxml field should be moved outside from this table. Something like biblio_blob, with biblionumber, biblioitemnumber and marcxml. When we need it, just join the tables.

I'm almost sure it would have an important impact on Koha, as biblioitems table is called and used "everywhere".

any opinion ?



-- 
Paul Poulain, Associé-gérant / co-owner
BibLibre, Services en logiciels libres pour les bibliothèques
BibLibre, Open Source software and services for libraries

_______________________________________________
Koha-devel mailing list
Koha-devel at lists.koha-community.org <mailto:Koha-devel at lists.koha-community.org> 
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

-- 

Tomás Cohen Arazi

Theke Solutions (https://theke.io <http://theke.io/> )
✆ +54 9351 3513384
GPG: B2F3C15F



  _____  


Koha-devel mailing list
Koha-devel at lists.koha-community.org <mailto:Koha-devel at lists.koha-community.org> 
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20160713/70d817d3/attachment-0001.html>


More information about the Koha-devel mailing list