[Koha-bugs] [Bug 11203] Datatables in acqusitions do not ignore "stopwords" in titles

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Tue Jan 7 23:58:55 CET 2014


http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=11203

--- Comment #8 from David Cook <dcook at prosentient.com.au> ---
(In reply to Katrin Fischer from comment #7)
> Hm, not sure about "key-value-relationships" - wouldn't it be just a word
> list? You don't want to sort differently when switching templates, so I
> think the actual langauge of a word is not needed.

Mmm, good point. I'm trying to think of examples where an article in one
language might be a non-article in another language...

'The' is an English article, but 'Thé' is a French noun that we wouldn't want
to ignore. 

I suppose the regex might differentiate letters with diacritics...

'A' is the French preposition 'A'/'à' (often when capitalized, the article
doesn't have a diacritic). 

Tricky.

Here is a more comprehensive list of articles in multiple languages...

http://library.princeton.edu/departments/tsd/katmandu/catcopy/article.html

--

That said, an English install might have records in multiple languages and you
would probably want to sort all of them without articles.

I don't know if there is a way of offering a 100% consistent sort across
languages though. Of course, using the articles of the selected language isn't
very consistent either, so I'm tempted to say that the system preference is the
best bet.

I suppose the system preference could make it easier to deal with problems. If
you just have English, you could just use English articles. If you just have
French, you could just use French, etc. You might even be able to use a few
articles from a few languages.

I think French + English would have problems though (because of 'A' and maybe
'The').

German + English also looks like it would have problems. "Die" is a pretty
common English noun. "Den" maybe less so, but still. 

Of course, this is all just from a list. I'd be interested to hear from more
native speakers.

--

Other problematic words I see:

'as' => Portuguese/Gallegan||Galician
'bat' => Basque
'am' => Gaelic
'den' => Danish/German/Norwegian/Swedish
'die' => Afrikaans/German/Yiddish
'et' => Danish/Norwegian (maybe...)
'he' => Hawaiian
'hen' => Greek
'hi' => Icelandic
'i' => Italian
'in' => Friesian
'it' => Friesian
'nina' => Tagalog
'os' => Portugeuse (both for English OS and the French noun Os)
'to' => Greek (Need a native speaker for this one. I thought 'to' was the Greek
pronoun for the English 'it')
'ton' => Greek

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list