[Koha-devel] marc_word and searching

Fri May 28 00:28:00 CEST 2004

Joshua Ferraro a écrit :

>Paul a écrit:
>
>NPL had a tech meeting today focusing on the opac searching and we have
>reached some tentative conclusions about how to proceed.  Running some
>test searches using marc_subfield_table we realized that a search model
>based on that table is inadequate for our needs.  For example, a search 
>on 'patrick o'brian' using the 'like' syntax produces no results if the
>database entry is stored as 'o'brian, patrick' (when author is stored in
>the 100a that is the format).  On the other hand, a search using the current
>marc_word model fails for reasons we have already talked about (marc_word
>does not keep track of single characters, &c.).  But if the marc_word table
>did index single charcters, a search model based on marc_word  would work 
>very well.  For example, a search on 'o'brian, patrick' or 'patrick o'brian'
>would both return the correct records.  So our idea is to re-create our
>marc_word table so that it indexes all characters from the tags and subfields
>that we want to use for searches (we don't need all of them as you pointed
>out; for instance, we will never use 300 for a search).  So we have three
>basic tasks:
>  
>
another idea, that would be better maybe :
replace ' by _.
Thus, o'brian searches o_brian, that will be stored in the DB.
The only limit is that a search on brian won't be successful. Tell me if 
it's a problem.

Otherwise, we could add a 'index also 1 letter words', but, imho, ONLY 
with the 'do not index this subfield feature'.

Everybody can give it's opinion here. Both solutions are easy to code.

>1.) write a script to re-create marc_word using the parameters we choose
>for searching and including all characters.
>
>2.) fix Biblio.pm so that it will include all characters when it adds records
>to marc_word (currently we add to our holdings using a modified version of
>bulkmarcimport.pl that relies on Biblio.pm)
>
>3.) write a clean-up script to delete all the tags and subfields from marc_word
>that we will never use (like 300)
>
>Does that sound like a sound plan to you Paul?  Do you have any scripts that
>will speed up the process of re-building our marc_word table--if not we will 
>write one ourselves.  Can you make the changes to Biblio.pm that will force
>it to index single characters?
>  
>
yep, if we decide to do it.
I've no speedy script to rebuild marc_word table :-(

>One final point about search results.  Currently the marc searching does
>not pass all the variables to the template so that we can choose what
>values to display (for example, Lord of the Rings: The Two Towers currently
>displays as 'Lord of the Rings:' without the subtitle).  I suggest that
>we setup a method of easily making marc fields available to the template
>so that each library can decide exactly what marc fields they want to 
>display for the initial search results.
>  
>
already planned. I'll try to commit some code on CVS ASAP.
"MARC view" is ready (in OPAC).
we plan to add a systempreference called 'ISBD' where the library could 
define it's own biblio presentation.
Something like :
[200a;][200b/][(100c)]

The ; means a ; is added AFTER the 200a, the ( means a ( is added BEFORE 
the 100c.
Not exactly a ISBD view, but not too far either.

>Comments? Suggestions?
>  
>
>>you're right stephen...
>>I have an other idea that could be coded quickly : in the MARC 
>>framework, we could add a checkbox called "do NOT index this subfield".
>>If checked, the subfield wouldn't be stored in marc_word (but stored in 
>>marc_subfield_table)
>>(Needs a script to clean the DB too, should be quite easy :
>>foreach subfield in marc_subfield_structure {
>>   if checkbox checked {
>>      delete from marc_word where subfield= this one
>>   }
>>}
>>...)
>>    
>>
-- 
Paul POULAIN
Consultant indépendant en logiciels libres
responsable francophone de koha (SIGB libre http://www.koha-fr.org)