[Koha-devel] marc_word and searching

Joshua Ferraro jferraro at athenscounty.lib.oh.us
Mon May 24 10:44:02 CEST 2004


Paul et al,

I've been trying to figure out how best to solve our ' and , problem
with the marc searching and I've got a few comments to make about the
way that the searches are currently done (using marc_word) and the
problems with how marc_word stores data.

So here's a classic example of an author that fails currently:
o'brian, patrick

right now the search seperates the 'o' and the 'brian' and the 'patrick'
and the resulting query looks like this:

select distinct m1.bibid from biblio,biblioitems,marc_biblio,marc_word as m1,marc_word as m2,marc_word as m3,marc_word as m4 where biblio.biblionumber=marc_biblio.biblionumber and biblio.biblionumber=biblioitems.biblionumber and m1.bibid=marc_biblio.bibid and (m1.bibid=m2.bibid and m1.bibid=m3.bibid and m1.bibid=m4.bibid) and ((m1.word  like 'o%' and m1.tag+m1.subfieldid in ('100a','110a', '700a', '710a'))and (m2.word like '\'%' and m2.tag+m2.subfieldid in('100a','110a', '700a', '710a'))and (m3.word like 'brian%' and m3.tag+m3.subfieldid in('100a','110a', '700a', '710a'))and (m4.word like 'patrick%' and m4.tag+m4.subfieldid in('100a','110a', '700a', '710a'))) order by biblio.title

So there is at least one major problem with this query which does not return
any results): marc_word does not store values as small as ' or o.  So of course
there are no results ...

Even if I strip out the ' and , from the query and search on something like
(I add the following after line 117 in SearchMarc.pm):

@$value[$i] =~ s/'/ /g;
@$value[$i] =~ s/,/ /g;

which turns out like:

'o brian patrick' 

it fails ('o' is too small for marc_word); and of course 

@$value[$i] =~ s/'//g;
@$value[$i] =~ s/,//g;

resulting in:

'obrian patrick' 

fails too--the data simply isn't stored right for this kind of search.

So I see two ways to fix this problem: 1) stop using marc_word for these
kinds of searches and use marc_subfield_table (which has the whole 
'o'brian, patrick' in subfield_value) or 2) fix the way that marc_word
stores small values (it should store everything including , ' and single
letters like 'a', 'o', etc.

Any comments?  Further suggestions? 

Joshua





More information about the Koha-devel mailing list