[Koha-devel] inverted list Proof of Concept [quest for search]

Paul POULAIN paul.poulain at free.fr
Mon May 30 01:19:09 CEST 2005


Tümer Garip a écrit :
> Dear Paul,

(cc to koha-devel to this mail sent personnaly to me)
> 
> Excellent idea. I wanted to write to you for a long time. Now I take
> this opprtunity
> I have already tried it and it looks it is definitely the way to go. 
> 1- I had a few problems which I thought I should share it with you. The
> problem with the foreign characters lies in the fact that the perl code
> changes everything to uppercase but offcourse not the accented
> characters. So say a Turkish word 'çim' becomes 'çIM'. We also have say
> 'ÇIM'. These two words produce identical index keys for MySql as it is
> case insensitive. 

I know this, but it's just a proof of concept.

> That is why you are getting SQL errors as Mysql is finding duplicating
> keys and dropping some our your data. I have changed the PRIMARY index
> on your table to INDEX and this solves the problem.

nope. The correct way to go is to remove all accented characters. Seems 
we could do this with :

use utf8;
use Unicode::Normalize;

$w = NFKD $w;
$w =~ s/\pM//g;


> 2- Is there a reason we are using 'title' as one of the keys of the
> inverted file?. Are we going to use this 'title' anywhere else later on.
> If not it is taking unnecessary space. I used bibid-biblionumber
> combination and your code still works. 

yes, but you can't order the result by title without another sql query. 
That's why I think we need one table for each ordering we want to have
with biblio-ORDERING_FIELD
like : biblionumber-title, biblionumber-author, biblionumber-dewey...

> If there is nothing that I am
> missing I suggest we use bibid-biblionumber. This way we can reach the
> MARC files or old Koha files directly if we need to at a later stage. 
> 
> 3- Another subject:
>    I have translated the NLP opac pages for 2.2.2b to Turkish. I do not
> intend to translate intranet pages. I could not use the .po files as the
> making of Turkish sentences is almost upside down. If there is any need
> for these files I can e-mail them to you.

Could be interesting, but the problem is that templates are changing, so 
you must use .po files. softwares like KDE or Gnucash uses .po files & 
are translated to turkish isn't it ?

> 
> Good work,
> 
> Tumer Garip
> 
> NEU Library
> Cyprus
-- 
Paul POULAIN
Consultant indépendant en logiciels libres
responsable francophone de koha (SIGB libre http://www.koha-fr.org)




More information about the Koha-devel mailing list