[Koha-devel] optimising the opac a little

Gavin McCullagh gavin at celt.dias.ie
Mon May 30 09:24:19 CEST 2005


Hi,

I'm a newbie here so please be patient with me.  I'm going to open up
straight away into a proposal, I hope people don't think this rude.  If it
can't be done, chances are you know why and I don't cause I'm new :-) 

I have a proposal which I'd like to suggest to optimise the Koha web based
opac system.  I'd like your opinions as to whether I'm right on track,
doing redundant work or just plain off my rocker.

It seems from a fairly cursory look that the opac system is basically
constructing queries with LIKE '%xxx%' within them to interrogate the
biblio table.  This works fine of course but if those columns are unindexed
this LIKE query seems (in my experience) a little slow.

Suggestions to optimise:

1. Add FULLTEXT indexes to each of the searched columns (author,title,...)
2. Change the LIKE queries to use something like the following:

SELECT ......, MATCH biblio.title AGAINST ('$titleSearchString' IN BOOLEAN MODE) AS TitleRelevance
FROM biblio .....
WHERE ..... 
	AND MATCH biblio.title AGAINST ('$titleSearchString' IN BOOLEAN MODE)
ORDER BY TitleRelevance DESC

The where clause does a word search of the title field which is very quick,
as long as the indexes are in place.  The TitleRelevance field in the
select gives a match relevance so that you're best matches appear first
(ORDER BY TitleRelevance DESC).  I was mullign it over in my head and I
suspect the way to combine two different column relevances is to multiply
them.


-- Reservations --

1. FULLTEXT indexes and MATCH are in MySQL as of version 3.23.23.  Is it
likely people are using other databases which don't have FULLTEXT indexes?
Is it likely people are using older versions of MySQL?

http://dev.mysql.com/doc/mysql/en/fulltext-search.html
http://www.databasejournal.com/features/mysql/article.php/1578331
http://jeremy.zawodny.com/blog/archives/000576.html

2. BOOLEAN MODE is a very recent addition.  Indeed it's not in the MySQL
package in Debian Woody.  It is in sarge though (Mysql >v4.0.1).  This is
strictly optional for now as it just makes the searching more powerful
(allowing use of wildcards, string removal, etc.).  I'm not sure if an old
version will ignore it or complain, but this could be left out for now.

http://dev.mysql.com/doc/mysql/en/fulltext-boolean.html
http://www.databasejournal.com/features/mysql/article.php/1587371

Most people don't use the boolean mode stuff, so it might be nice to have a
random "tip" on the search page like:

	Tip: to search for partial strings use the * wildcard 
		eg "Mac*" will match "MacCarthaigh". 

and a link to some advanced searching tips.

3. The addition of indexes on the biblio table will slow insertions/updates
down.  This is a price you pay for faster searching.  In my experience it's
worth the sacrifice.

So, is this all useless nonsense?  Should I crawl back under that rock?

I'm willing to roll up my sleeves and get at it but I'd like to know this
work is useful and it would help to have someone known in the community to
keep an eye on the code.  As I see it this is a mainly sql task and there
shouldn't be too many changes to the perl code itself but I'm not so
familiar with the greater part of the project code so I'd rather have
someone to look over my shoulder now and then.

Here is an example I've just done in PHP on MySQL with a similar database
structure.  There are ~20K records.

http://simms.celt.dias.ie/~gavin/opac/

You can search the titles for 'gael*', the author for 'breatnach', keywords
for 'royal' and 'ireland'.  I have not yet implemented cross-linking.  I'll
take this down in a few days.

Gavin






More information about the Koha-devel mailing list