[Koha-devel] [Koha] need help in zebra indexing for Arabic words

Karam Qubsi karamqubsi at gmail.com
Fri Oct 26 01:20:13 CEST 2012


if any one use the attached file for the Arabic koha
he will not find problem that we talked about :)  ( the prefix ) - I will
develop it some way to support suffix (in arabic )

copy and past the file in this target :
/etc/koha/zebradb/etc/

Best regards
karam

On Thu, Oct 25, 2012 at 6:58 PM, Karam Qubsi <karamqubsi at gmail.com> wrote:

> Hi all
> I solved this in zebra by customizing the transliterate rule  in
> words-icu.xml file
>
> I will share a complete file solve this in Arabic soon !
>
> the solution is by adding the following : (for example ) : I will not use
> here the Arabic characters  to make it more simple :
>
> if we have language X and in this language we write in connected letter
> but some letter is not important in the search process , so we have this
> word  " *the*word " in the search the searcher is not interested in
> finding *the*  but he is absolutely search for "word "
>
> so I solve this by following this guide :
> http://userguide.icu-project.org/transforms/general/rules#TOC-Context
>
> and make zebra convert thew to w
> and we may have to make this for every letter thea to a _ theb to b >>>>
> thez to z
>
> like in the following :
>   <transliterate rule="{ thea > a "/>
>   <transliterate rule="{ thew > w "/>
> ...
> ...
> ..
>   <transliterate rule="{ thez > z "/>
> so if some one search for theword the zebra will convert thew to w so
> searching for word = theword :D
>
> and for Arabic :
>   <transliterate rule="{ الا > ا "/>
>   <transliterate rule="{ الب > ب "/>
> .....
> ...
> ...
> ..
>
>   <transliterate rule="{ الي > ي "/>
> so searching for  " بحث"
> will find  "البحث"
>
> and this will solve the whole problem :)
> I wish this will help you Mohamed
>
> Thank you Frédéric , Paul
>
> Karam
>
>
> On Thu, Oct 25, 2012 at 9:23 AM, Karam Qubsi <karamqubsi at gmail.com> wrote:
>
>> Yes it's not a koha problem
>> but I think there is some people who fix this in zebra ( or maybe it's
>> just some more options to add in zebra files )
>>
>> Massoud Alshareef  from KnowledgeWare Technologies mention that they have
>> do that and solve the problem
>> in : http://koha-community.org/category/koha-news/support-company-press/
>>
>> I wish if he can help us in this (cc to him )
>>
>> I heard about solr that it's very good but I didn't search about arabic
>> support if better than zebra but I see this now :
>> http://wiki.apache.org/solr/LanguageAnalysis#Arabic
>>
>> anyway thanks a lot and I will search more about that if I find some
>> solution I will share it with you
>>
>>
>> best regards
>> Karam .
>>
>>
>>
>> On Thu, Oct 25, 2012 at 8:02 AM, Paul Poulain <paul.poulain at biblibre.com>wrote:
>>
>>> Le 25/10/2012 13:53, Frédéric Demians a écrit :
>>> > No, you don't need help, you need to contract a developer to do the
>>> job.
>>> What Frederic is explaining here is that you can't achieve this with the
>>> current Koha. And I suspect it's not a koha problem, but a zebra/icu one.
>>>
>>> Side comment = we're working on integration of a new search engine layer
>>> (solr). Maybe solr will fix this problem ?
>>>
>>> Anyway, we're looking for some funding for continuing the work on search
>>> layer (see:
>>>
>>> http://wiki.koha-community.org/w/index.php?title=C_%26_P_Search_Rewrite_RFC
>>> )
>>>
>>>
>>> --
>>> Paul POULAIN
>>> http://www.biblibre.com
>>> Expert en Logiciels Libres pour l'info-doc
>>> Tel : (33) 4 91 81 35 08
>>> _______________________________________________
>>> Koha-devel mailing list
>>> Koha-devel at lists.koha-community.org
>>> http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
>>> website : http://www.koha-community.org/
>>> git : http://git.koha-community.org/
>>> bugs : http://bugs.koha-community.org/
>>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/koha-devel/attachments/20121025/0ff47b8a/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: words-icu.xml
Type: text/xml
Size: 1655 bytes
Desc: not available
URL: </pipermail/koha-devel/attachments/20121025/0ff47b8a/attachment-0001.bin>


More information about the Koha-devel mailing list