[Koha-devel] marc_word and searching
paul POULAIN
paul.poulain at free.fr
Wed May 26 07:40:30 CEST 2004
Stephen Hedges a écrit :
>At what point does marc_word become so big and clunky that it becomes a
>liability instead of an asset? NPL's marc-word file is full of 'junk'
>entries like "(pa." (picked up when an ISBN number has "(pa.)" after it to
>denote paperback) and other such MARC oddities. Our stopword file should
>ideally be expanded to catch all of this junk, but I haven't done that
>yet. Now we're talking about adding punctuation marks and single letters!
> I agree with Joshua that this is what should be done if we're going to
>depend on using marc_word and expect to get any meaningful search results.
> My question is: maybe it would be more efficient to just use
>marc_subfield_table for these searches and forget about marc_word?
>
you're right stephen...
I have an other idea that could be coded quickly : in the MARC
framework, we could add a checkbox called "do NOT index this subfield".
If checked, the subfield wouldn't be stored in marc_word (but stored in
marc_subfield_table)
(Needs a script to clean the DB too, should be quite easy :
foreach subfield in marc_subfield_structure {
if checkbox checked {
delete from marc_word where subfield= this one
}
}
...)
--
Paul POULAIN
Consultant indépendant en logiciels libres
responsable francophone de koha (SIGB libre http://www.koha-fr.org)
More information about the Koha-devel
mailing list