[Koha-zebra] Zebra and non-filing characters

Sebastian Hammer quinn at indexdata.com
Thu Dec 29 17:05:24 CET 2005


Paul POULAIN wrote:

> Sebastian Hammer a écrit :
>
>> Joshua Ferraro wrote:
>>
>>> Hello everyone,
>>>
>>> This is just generic question regarding Zebra's handling of
>>> MARC non-filing characters. I know there is a 'stopwords'-like
>>> function available using the 'map' directive:
>>>
>>> map (^The\s) @
>>>
>>> but I'm wondering whether Zebra is also capable of examining the
>>> non-filing character specs within each MARC field to decide
>>> whether to index or not to index ...
>>
>> You mean using an indicator in the field to determine how many 
>> characters to skip? To the best of my knowledge, this is not 
>> supported at present, sorry.
>
>
> Would really be a nice feature, at least for MARC-lover catalogers 
> (that still exists !)
>
>> What I don't like about that approach anyway is that it leaves it 
>> ambiguous what happens when the user put a leading article into a 
>> search term... I think yu'd be better off just configuring the system 
>> to ignore the most common leading articles as described above.
>
>
> pro : will work even if the cataloger forget to set the indicator & 
> makes them more and more useless.
> con : MARC-lover catalogers will hate such a behaviour, because there 
> are few exceptions. I think i can assume the noise french catalogers 
> will make ;-)

But I think the issue with searching is pretty serious, though.. I've
been noticing lately a few Z39.50 servers that will return zero hits for
a full-field search if the user forgets (or doesn't know) to remove any
leading article himself. Now even for a MARC-fetishist, I think that is
just plain wrong. If you are going to eliminate leading articles from
searches, the least you can do is make it optional..

One way to do that with the dumb MARC21 character-skipping scheme would
be to generate two indexing entries for phrase indexes -- with and
without the offending leading article. That would fix searching, but it
would be a problem for sorting unless we were careful.

Browsing can also be a challenge.

My vote would be to start with the prefix-ignoring list, which in my
experience is enough to satisfy 99.9% of librarians, most of whom have
no clue about that feature of MARC21 anyway. Leave the other stuff as a
nice-to-have to be addressed at leisure at some point when we're
re-examining that part of the indexing logic anyway.

--Sebastan

>> It is true that this would require separate configuration for 
>> different languages, but you probably wouldn't get around that 
>> anyway, since many non-English-speaking countries use other record 
>> formats than MARC21, and the use of indicators to control indexing is 
>> not universal.. the Danish MARC (cleverly named DANMARC) format, for 
>> instance, use a special character inside of the subfields to mark the 
>> part which should not be indexed.
>
> In what is already developped in Koha 3.0, we will clearly have 
> UNIMARC-french, MARC21-english, and probably other MARC-language 
> flavours. So I agree with you.
>
> Happy new year to everyone, with lot of free software & happiness !
>

-- 
Sebastian Hammer, Index Data
quinn at indexdata.com   www.indexdata.com
Ph: (603) 209-6853











More information about the Koha-zebra mailing list