<div dir="ltr">I agree 100%, but I'd go for a metadata_record table with schema, (id), biblionumber, format and metadata columns to start supporting more and more schemas. Example:<div><br></div><div>| id | format     | schema | metadata</div><div>| 1  | marcxml | marc21  | ... </div><div>| 2  | usmarc   | unimarc | ...<br></div><div>| 3  | mij          | marc21  | ...<br></div><div><br></div><div>pretty much like we do with Koha::MetadataRecord actually :-D</div><div><br></div><div>Nice catch, Paul!</div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr">El mar., 12 jul. 2016 a las 13:43, Paul Poulain (<<a href="mailto:paul.poulain@biblibre.com">paul.poulain@biblibre.com</a>>) escribió:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  
  <div bgcolor="#FFFFFF" text="#000000">

    Hi all,<br>

    <br>

    Those days, we're working on a pretty large DB ( >1M biblio), for

    a customer that want to do many statistics on some fields.<br>

    We discovered that something "simple" like:<br>

    SELECT publicationyear, count(publicationyear) FROM biblioitems

    GROUP BY publicationyear;<br>

    <br>

    was giving <b>no result in 10mn</b>.<br>

    This is a test DB, not optimized, but we were surprised by the

    results.<br>

    After investigating we had the idea to create a biblioitems2 table

    with the same structure <b>EXCEPT MARCXML and MARC fields<br>

      <br>

    </b>launch the same SQL query : <b>result in 3seconds</b> !<b><br>

    </b>This could be reproduced on any query (on fields without index).<br>

    <br>

    I think it's because the innoDB is storing each line in one

    "object", so, even if you need only one column, you have to read

    everything.<br>

    In our case, that was 12GB+ of data to read.<br>

    biblioitems2 is just a few dozen MB.<br>

    (all caching values are minimum and there's no index, so not

    involved in the results)<br>

    <br>

    MY CONCLUSIONS:<br>

     * the biblioitems.marc field must be removed quickly: it's useless

    since years, and is only resulting in slowing things<br>

     * the bilbioitems.marcxml field should be moved outside from this

    table. Something like biblio_blob, with biblionumber,

    biblioitemnumber and marcxml. When we need it, just join the tables.<br>

    <br>

    I'm almost sure it would have an important impact on Koha, as

    biblioitems table is called and used "everywhere".<br>

    <br>

    any opinion ?<br>

    <b></b>

    <pre cols="72">-- 

Paul Poulain, Associé-gérant / co-owner

BibLibre, Services en logiciels libres pour les bibliothèques

BibLibre, Open Source software and services for libraries</pre>

  </div>


_______________________________________________<br>

Koha-devel mailing list<br>

<a href="mailto:Koha-devel@lists.koha-community.org" target="_blank">Koha-devel@lists.koha-community.org</a><br>

<a href="http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel" rel="noreferrer" target="_blank">http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel</a><br>

website : <a href="http://www.koha-community.org/" rel="noreferrer" target="_blank">http://www.koha-community.org/</a><br>

git : <a href="http://git.koha-community.org/" rel="noreferrer" target="_blank">http://git.koha-community.org/</a><br>

bugs : <a href="http://bugs.koha-community.org/" rel="noreferrer" target="_blank">http://bugs.koha-community.org/</a></blockquote></div><div dir="ltr">-- <br></div><div data-smartmail="gmail_signature"><div dir="ltr"><div style="color:rgb(117,117,117);font-family:'helvetica neue',helvetica,arial,sans-serif;font-size:12.8px">Tomás Cohen Arazi</div><div style="color:rgb(117,117,117);font-family:'helvetica neue',helvetica,arial,sans-serif;font-size:12.8px">Theke Solutions (<a href="http://theke.io/">https://theke.io</a>)<br>✆ +54 9351 3513384<br>GPG: B2F3C15F</div></div></div>