[Koha-devel] Default Elasticsearch behaviour wrong with Chinese: can't find complete title

Nick Clemens nick at bywatersolutions.com
Wed Apr 4 13:10:25 CEST 2018


Interesting, yes, the star was added to support auto_truncation and enabled
by default. For languages using latin scripts we need the star, otherwise a
search for "cat" will not return results containing "cats"

I am not sure what the path to correcting this is - I think you should file
a bug report with this info and we can take a deeper look into how we are
building our searches and what we can do.

On Tue, Apr 3, 2018 at 10:22 AM Nicolas Legrand <nicolas.legrand at bulac.fr>
wrote:

> Good day devs,
>
> Nick spotted these one during last Marseille Hackfest. We made some test
> with our catalogue on master and find out how to reproduce it, how to break
> it and how to fix it, though the inner mechanics remains a mystery and we
> are not quite sure about what the default behaviour should be.
>
> We did our test with 中國翻譯 (Chinese Translators Journal) which have two
> words highly present in our Catalog: China and translation.
>
> First, the default Koha behaviour is to add a "*" at the end of the
> searched word, which lead to 0 results. It produces a query looking like
> this one:
>
> $ curl  "http://localhost:9200/koha_robin_biblios/_search?pretty" -d
> '{"from": 0, "size": 0,"query":{"query_string":{"query": "中國翻譯*"}}}'
> {
>   "took" : 1,
>   "timed_out" : false,
>   "_shards" : {
>     "total" : 5,
>     "successful" : 5,
>     "skipped" : 0,
>     "failed" : 0
>   },
>   "hits" : {
>     "total" : 0,
>     "max_score" : 0.0,
>     "hits" : [ ]
>   }
> }
>
> If we quote 中國翻譯 in Koha, it yields one answer, the right one. It produces
> a query looking like this one:
>
> $ curl  "
> http://bouse02.prive.bulac.fr:9200/koha_robin_biblios/_search?pretty" -d
> '{"from": 0, "size": 0,"query":{"query_string":{"query": "\"中國翻譯\""}}}'
> {
>   "took" : 5,
>   "timed_out" : false,
>   "_shards" : {
>     "total" : 5,
>     "successful" : 5,
>     "skipped" : 0,
>     "failed" : 0
>   },
>   "hits" : {
>     "total" : 1,
>     "max_score" : 0.0,
>     "hits" : [ ]
>   }
> }
>
> Note that if I write an Elasticsearch query without quotes or star, it
> yields too much results (9626), the “right” result isn't in the ten first
> results:
>
> $ curl  "
> http://bouse02.prive.bulac.fr:9200/koha_robin_biblios/_search?pretty" -d
> '{"from": 0, "size": 0,"query":{"query_string":{"query": "中國翻譯"}}}'
> {
>   "took" : 16,
>   "timed_out" : false,
>   "_shards" : {
>     "total" : 5,
>     "successful" : 5,
>     "skipped" : 0,
>     "failed" : 0
>   },
>   "hits" : {
>     "total" : 9626,
>     "max_score" : 0.0,
>     "hits" : [ ]
>   }
> }
>
>
> I'm not sure what the right behaviour needs to be. We felt adding quotes
> added a lot of relevance to our results no matter what the language is.
> What is certain is that adding a star to the search by default doesn't help
> us. We didn't have the problem with Elasticsearch while playing with it in
> 17.05. For us, it is a regression. I add the MARC of our test record.
>
> What do you think about it?
>
> Best regards,
>
> --
>
> *Nicolas Legrand*
> Administration technique et développements du système de gestion de la
> bibliothèque
>
> [image: Logo BULAC]
>
> Bibliothèque universitaire
> des langues et civilisations
>
> 65 rue des Grands Moulins
> F-75013 PARIS
> T +33 1 81 69 *18 22*
> www.bulac.fr
> _______________________________________________
> Koha-devel mailing list
> Koha-devel at lists.koha-community.org
> http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> website : http://www.koha-community.org/
> git : http://git.koha-community.org/
> bugs : http://bugs.koha-community.org/

-- 
Nick Clemens
Sonic Screwdriver (Development Support)
ByWater Solutions
IRC: kidclamp
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20180404/914630f5/attachment.html>


More information about the Koha-devel mailing list