[Koha-devel] DB design (MARC structure)

Sat Jun 19 07:47:02 CEST 2004

Paul a écrit,

I'm still showing some problems with the searching speed with the latest
CVS ... maybe there are still some index problems.  I tried an author 
search on "o'brian, patrick" and the result took about 10-15 seconds to 
return.  Directly from mysql I get:

mysql> select distinct m1.bibid from biblio,biblioitems,marc_biblio,marc_word as m1,marc_word as m2,marc_word as m3 where biblio.biblionumber=marc_biblio.biblionumber and biblio.biblionumber=biblioitems.biblionumber and m1.bibid=marc_biblio.bibid and (m1.bibid=m2.bibid and m1.bibid=m3.bibid) and ((m1.word  like 'o%' and m1.tagsubfield in ('100a','110a', '700a', '710a'))and (m2.word like 'brian%' and m2.tagsubfield in('100a','110a', '700a', '710a'))and (m3.word like 'patrick%' and m3.tagsubfield in('100a','110a', '700a', '710a'))) order by biblio.title;

77 rows in set (5.34 sec)

here's the explain on that query:

mysql> explain select distinct m1.bibid from biblio,biblioitems,marc_biblio,marc_word as m1,marc_word as m2,marc_word as m3 where biblio.biblionumber=marc_biblio.biblionumber and biblio.biblionumber=biblioitems.biblionumber and m1.bibid=marc_biblio.bibid and (m1.bibid=m2.bibid and m1.bibid=m3.bibid) and ((m1.word  like 'o%' and m1.tagsubfield in ('100a','110a', '700a', '710a'))and (m2.word like 'brian%' and m2.tagsubfield in('100a','110a', '700a', '710a'))and (m3.word like 'patrick%' and m3.tagsubfield in('100a','110a', '700a', '710a'))) order by biblio.title;
+-------------+--------+------------------------+-------------+---------+--------------------------+------+-----------------------------------------------+
| table       | type   | possible_keys          | key         | key_len | ref                      | rows | Extra                                        |
+-------------+--------+------------------------+-------------+---------+--------------------------+------+-----------------------------------------------+
| m2          | range  | bibid,word,Search_Marc | Search_Marc |     259 | NULL                     |  366 | Using where; Using temporary; Using filesort |
| m1          | ref    | bibid,word,Search_Marc | bibid       |       8 | m2.bibid                 |   56 | Using where                                  |
| marc_biblio | eq_ref | PRIMARY,biblionumber   | PRIMARY     |       8 | m1.bibid                 |    1 | Using where; Distinct                        |
| biblio      | eq_ref | PRIMARY,blbnoidx       | PRIMARY     |       4 | marc_biblio.biblionumber |    1 | Distinct                                     |
| biblioitems | ref    | bibnoidx               | bibnoidx    |       4 | biblio.biblionumber      |   12 | Using index; Distinct                        |
| m3          | ref    | bibid,word,Search_Marc | bibid       |       8 | m1.bibid                 |   20 | Using where; Distinct                        |
+-------------+--------+------------------------+-------------+---------+--------------------------+------+-----------------------------------------------+
6 rows in set (0.04 sec)

mysql>

So it still looks like we're using temp and filesort--which I assume is
causing the hangup ... if that is the best we can do we may need to start
thinking about breaking up marc_word into sections (e.g., marc_word_title;
marc_word_author, etc.).  The search is really accurate but just too slow
for a production database as large as ours.  What do folks think, would that
speed things up?

Joshua

On Thu, Jun 17, 2004 at 05:20:45PM +0200, Paul POULAIN wrote:
> Paul POULAIN a écrit :
> 
> >Joshua Ferraro a écrit :
> >
> >>If I understand it correctly this sounds like a great solution to
> >>our speed issues with the current marc searching--and it won't 
> >>compromise the accuracy of the search either.  Let's do it
> >>
> >ok, it's commited.
> >works fine on a small DB.
> 
> works fine on your DB too :
> 
> EXPLAIN SELECT DISTINCT m1.bibid
> FROM biblio, biblioitems, marc_biblio, marc_word AS m1
> WHERE biblio.biblionumber = marc_biblio.biblionumber AND 
> biblio.biblionumber = biblioitems.biblionumber AND m1.bibid = 
> marc_biblio.bibid AND (
> m1.word
> LIKE 'wolf%' AND m1.tagsubfield
> IN (
> '245a'
> )
> )
> ORDER BY biblio.title
> 
> 
> table     type     possible_keys     key     key_len     ref     rows 
>    Extra
> m1     range     bibid,word,Marc_Search     Marc_Search     259     NULL 
>    114     Using where; Using temporary; Using filesort
> marc_biblio     eq_ref     PRIMARY,biblionumber     PRIMARY     8     
> m1.bibid     1     Using where; Distinct
> biblio     eq_ref     PRIMARY,blbnoidx     PRIMARY     4     
> marc_biblio.biblionumber     1     Distinct
> biblioitems     ref     bibnoidx     bibnoidx     4     
> biblio.biblionumber     1     Using index; Distinct
> 
> we can't do better ;-)
> 
> NB : to have something working well :
> * copy your 2.0 DB
> * run updatedatabase.
> (all indexes should be created. Be patient, 5-6 hours needed on your 
> marc_word table)
> 
> -- 
> Paul POULAIN
> Consultant indépendant en logiciels libres
> responsable francophone de koha (SIGB libre http://www.koha-fr.org)
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by The 2004 JavaOne(SM) Conference
> Learn from the experts at JavaOne(SM), Sun's Worldwide Java Developer
> Conference, June 28 - July 1 at the Moscone Center in San Francisco, CA
> REGISTER AND SAVE! http://java.sun.com/javaone/sf Priority Code NWMGYKND
> _______________________________________________
> Koha-devel mailing list
> Koha-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/koha-devel