[Koha-devel] biblioitems.marcxml & biblioitems.marc / HUGE performance issue !

Tomas Cohen Arazi tomascohen at gmail.com
Tue Jul 12 19:11:36 CEST 2016


I agree 100%, but I'd go for a metadata_record table with schema, (id),
biblionumber, format and metadata columns to start supporting more and more
schemas. Example:

| id | format     | schema | metadata
| 1  | marcxml | marc21  | ...
| 2  | usmarc   | unimarc | ...
| 3  | mij          | marc21  | ...

pretty much like we do with Koha::MetadataRecord actually :-D

Nice catch, Paul!



El mar., 12 jul. 2016 a las 13:43, Paul Poulain (<paul.poulain at biblibre.com>)
escribió:

> Hi all,
>
> Those days, we're working on a pretty large DB ( >1M biblio), for a
> customer that want to do many statistics on some fields.
> We discovered that something "simple" like:
> SELECT publicationyear, count(publicationyear) FROM biblioitems GROUP BY
> publicationyear;
>
> was giving *no result in 10mn*.
> This is a test DB, not optimized, but we were surprised by the results.
> After investigating we had the idea to create a biblioitems2 table with
> the same structure
>
> *EXCEPT MARCXML and MARC fields *launch the same SQL query : *result in
> 3seconds* !
> This could be reproduced on any query (on fields without index).
>
> I think it's because the innoDB is storing each line in one "object", so,
> even if you need only one column, you have to read everything.
> In our case, that was 12GB+ of data to read.
> biblioitems2 is just a few dozen MB.
> (all caching values are minimum and there's no index, so not involved in
> the results)
>
> MY CONCLUSIONS:
>  * the biblioitems.marc field must be removed quickly: it's useless since
> years, and is only resulting in slowing things
>  * the bilbioitems.marcxml field should be moved outside from this table.
> Something like biblio_blob, with biblionumber, biblioitemnumber and
> marcxml. When we need it, just join the tables.
>
> I'm almost sure it would have an important impact on Koha, as biblioitems
> table is called and used "everywhere".
>
> any opinion ?
>
> --
> Paul Poulain, Associé-gérant / co-owner
> BibLibre, Services en logiciels libres pour les bibliothèques
> BibLibre, Open Source software and services for libraries
>
> _______________________________________________
> Koha-devel mailing list
> Koha-devel at lists.koha-community.org
> http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> website : http://www.koha-community.org/
> git : http://git.koha-community.org/
> bugs : http://bugs.koha-community.org/

-- 
Tomás Cohen Arazi
Theke Solutions (https://theke.io <http://theke.io/>)
✆ +54 9351 3513384
GPG: B2F3C15F
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20160712/3583988d/attachment-0001.html>


More information about the Koha-devel mailing list