[Koha-devel] DB design (MARC structure)
Joshua Ferraro
jferraro at athenscounty.lib.oh.us
Sat Jun 19 07:47:02 CEST 2004
Paul a écrit,
I'm still showing some problems with the searching speed with the latest
CVS ... maybe there are still some index problems. I tried an author
search on "o'brian, patrick" and the result took about 10-15 seconds to
return. Directly from mysql I get:
mysql> select distinct m1.bibid from biblio,biblioitems,marc_biblio,marc_word as m1,marc_word as m2,marc_word as m3 where biblio.biblionumber=marc_biblio.biblionumber and biblio.biblionumber=biblioitems.biblionumber and m1.bibid=marc_biblio.bibid and (m1.bibid=m2.bibid and m1.bibid=m3.bibid) and ((m1.word like 'o%' and m1.tagsubfield in ('100a','110a', '700a', '710a'))and (m2.word like 'brian%' and m2.tagsubfield in('100a','110a', '700a', '710a'))and (m3.word like 'patrick%' and m3.tagsubfield in('100a','110a', '700a', '710a'))) order by biblio.title;
77 rows in set (5.34 sec)
here's the explain on that query:
mysql> explain select distinct m1.bibid from biblio,biblioitems,marc_biblio,marc_word as m1,marc_word as m2,marc_word as m3 where biblio.biblionumber=marc_biblio.biblionumber and biblio.biblionumber=biblioitems.biblionumber and m1.bibid=marc_biblio.bibid and (m1.bibid=m2.bibid and m1.bibid=m3.bibid) and ((m1.word like 'o%' and m1.tagsubfield in ('100a','110a', '700a', '710a'))and (m2.word like 'brian%' and m2.tagsubfield in('100a','110a', '700a', '710a'))and (m3.word like 'patrick%' and m3.tagsubfield in('100a','110a', '700a', '710a'))) order by biblio.title;
+-------------+--------+------------------------+-------------+---------+--------------------------+------+-----------------------------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------------+--------+------------------------+-------------+---------+--------------------------+------+-----------------------------------------------+
| m2 | range | bibid,word,Search_Marc | Search_Marc | 259 | NULL | 366 | Using where; Using temporary; Using filesort |
| m1 | ref | bibid,word,Search_Marc | bibid | 8 | m2.bibid | 56 | Using where |
| marc_biblio | eq_ref | PRIMARY,biblionumber | PRIMARY | 8 | m1.bibid | 1 | Using where; Distinct |
| biblio | eq_ref | PRIMARY,blbnoidx | PRIMARY | 4 | marc_biblio.biblionumber | 1 | Distinct |
| biblioitems | ref | bibnoidx | bibnoidx | 4 | biblio.biblionumber | 12 | Using index; Distinct |
| m3 | ref | bibid,word,Search_Marc | bibid | 8 | m1.bibid | 20 | Using where; Distinct |
+-------------+--------+------------------------+-------------+---------+--------------------------+------+-----------------------------------------------+
6 rows in set (0.04 sec)
mysql>
So it still looks like we're using temp and filesort--which I assume is
causing the hangup ... if that is the best we can do we may need to start
thinking about breaking up marc_word into sections (e.g., marc_word_title;
marc_word_author, etc.). The search is really accurate but just too slow
for a production database as large as ours. What do folks think, would that
speed things up?
Joshua
On Thu, Jun 17, 2004 at 05:20:45PM +0200, Paul POULAIN wrote:
> Paul POULAIN a écrit :
>
> >Joshua Ferraro a écrit :
> >
> >>If I understand it correctly this sounds like a great solution to
> >>our speed issues with the current marc searching--and it won't
> >>compromise the accuracy of the search either. Let's do it
> >>
> >ok, it's commited.
> >works fine on a small DB.
>
> works fine on your DB too :
>
> EXPLAIN SELECT DISTINCT m1.bibid
> FROM biblio, biblioitems, marc_biblio, marc_word AS m1
> WHERE biblio.biblionumber = marc_biblio.biblionumber AND
> biblio.biblionumber = biblioitems.biblionumber AND m1.bibid =
> marc_biblio.bibid AND (
> m1.word
> LIKE 'wolf%' AND m1.tagsubfield
> IN (
> '245a'
> )
> )
> ORDER BY biblio.title
>
>
> table type possible_keys key key_len ref rows
> Extra
> m1 range bibid,word,Marc_Search Marc_Search 259 NULL
> 114 Using where; Using temporary; Using filesort
> marc_biblio eq_ref PRIMARY,biblionumber PRIMARY 8
> m1.bibid 1 Using where; Distinct
> biblio eq_ref PRIMARY,blbnoidx PRIMARY 4
> marc_biblio.biblionumber 1 Distinct
> biblioitems ref bibnoidx bibnoidx 4
> biblio.biblionumber 1 Using index; Distinct
>
> we can't do better ;-)
>
> NB : to have something working well :
> * copy your 2.0 DB
> * run updatedatabase.
> (all indexes should be created. Be patient, 5-6 hours needed on your
> marc_word table)
>
> --
> Paul POULAIN
> Consultant indépendant en logiciels libres
> responsable francophone de koha (SIGB libre http://www.koha-fr.org)
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by The 2004 JavaOne(SM) Conference
> Learn from the experts at JavaOne(SM), Sun's Worldwide Java Developer
> Conference, June 28 - July 1 at the Moscone Center in San Francisco, CA
> REGISTER AND SAVE! http://java.sun.com/javaone/sf Priority Code NWMGYKND
> _______________________________________________
> Koha-devel mailing list
> Koha-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/koha-devel
More information about the Koha-devel
mailing list