[Koha-devel] biblioitems.marcxml & biblioitems.marc / HUGE performance issue !

Chris Cormack chrisc at catalyst.net.nz
Tue Jul 12 20:42:35 CEST 2016


+1 from me

On 13 July 2016 5:11:36 AM NZST, Tomas Cohen Arazi <tomascohen at gmail.com> wrote:
>I agree 100%, but I'd go for a metadata_record table with schema, (id),
>biblionumber, format and metadata columns to start supporting more and
>more
>schemas. Example:
>
>| id | format     | schema | metadata
>| 1  | marcxml | marc21  | ...
>| 2  | usmarc   | unimarc | ...
>| 3  | mij          | marc21  | ...
>
>pretty much like we do with Koha::MetadataRecord actually :-D
>
>Nice catch, Paul!
>
>
>
>El mar., 12 jul. 2016 a las 13:43, Paul Poulain
>(<paul.poulain at biblibre.com>)
>escribió:
>
>> Hi all,
>>
>> Those days, we're working on a pretty large DB ( >1M biblio), for a
>> customer that want to do many statistics on some fields.
>> We discovered that something "simple" like:
>> SELECT publicationyear, count(publicationyear) FROM biblioitems GROUP
>BY
>> publicationyear;
>>
>> was giving *no result in 10mn*.
>> This is a test DB, not optimized, but we were surprised by the
>results.
>> After investigating we had the idea to create a biblioitems2 table
>with
>> the same structure
>>
>> *EXCEPT MARCXML and MARC fields *launch the same SQL query : *result
>in
>> 3seconds* !
>> This could be reproduced on any query (on fields without index).
>>
>> I think it's because the innoDB is storing each line in one "object",
>so,
>> even if you need only one column, you have to read everything.
>> In our case, that was 12GB+ of data to read.
>> biblioitems2 is just a few dozen MB.
>> (all caching values are minimum and there's no index, so not involved
>in
>> the results)
>>
>> MY CONCLUSIONS:
>>  * the biblioitems.marc field must be removed quickly: it's useless
>since
>> years, and is only resulting in slowing things
>>  * the bilbioitems.marcxml field should be moved outside from this
>table.
>> Something like biblio_blob, with biblionumber, biblioitemnumber and
>> marcxml. When we need it, just join the tables.
>>
>> I'm almost sure it would have an important impact on Koha, as
>biblioitems
>> table is called and used "everywhere".
>>
>> any opinion ?
>>
>> --
>> Paul Poulain, Associé-gérant / co-owner
>> BibLibre, Services en logiciels libres pour les bibliothèques
>> BibLibre, Open Source software and services for libraries
>>
>> _______________________________________________
>> Koha-devel mailing list
>> Koha-devel at lists.koha-community.org
>> http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
>> website : http://www.koha-community.org/
>> git : http://git.koha-community.org/
>> bugs : http://bugs.koha-community.org/
>
>-- 
>Tomás Cohen Arazi
>Theke Solutions (https://theke.io <http://theke.io/>)
>✆ +54 9351 3513384
>GPG: B2F3C15F
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Koha-devel mailing list
>Koha-devel at lists.koha-community.org
>http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
>website : http://www.koha-community.org/
>git : http://git.koha-community.org/
>bugs : http://bugs.koha-community.org/

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20160713/ae493a64/attachment.html>


More information about the Koha-devel mailing list