[Koha-devel] marc_word and searching

Wed May 26 07:40:30 CEST 2004

Stephen Hedges a écrit :

>At what point does marc_word become so big and clunky that it becomes a
>liability instead of an asset?  NPL's marc-word file is full of 'junk'
>entries like "(pa." (picked up when an ISBN number has "(pa.)" after it to
>denote paperback) and other such MARC oddities.  Our stopword file should
>ideally be expanded to catch all of this junk, but I haven't done that
>yet.  Now we're talking about adding punctuation marks and single letters!
> I agree with Joshua that this is what should be done if we're going to
>depend on using marc_word and expect to get any meaningful search results.
> My question is:  maybe it would be more efficient to just use
>marc_subfield_table for these searches and forget about marc_word?
>
you're right stephen...
I have an other idea that could be coded quickly : in the MARC 
framework, we could add a checkbox called "do NOT index this subfield".
If checked, the subfield wouldn't be stored in marc_word (but stored in 
marc_subfield_table)
(Needs a script to clean the DB too, should be quite easy :
foreach subfield in marc_subfield_structure {
    if checkbox checked {
       delete from marc_word where subfield= this one
    }
}
...)

-- 
Paul POULAIN
Consultant indépendant en logiciels libres
responsable francophone de koha (SIGB libre http://www.koha-fr.org)