[Koha-bugs] [Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Thu Apr 12 22:10:12 CEST 2018


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

--- Comment #14 from David Gustafsson <glasklas at gmail.com> ---
Ok! I actually think that would be a bad idea, mainly for the following
reasons:

1) Elasticsearch uses ranking function called Okapi BM25 (used to be Term
Frequency/Inverse Document Frequency (TF/IDF), which similar but simpler to
understand). Two of the parameters Okapi BM25 uses to calculate the relevancy
score (per field) are average field length and inverse document frequency
(IDF). If you put all values in one field, average field length and inverse
document frequency will averaged out based on all fields, effectively crippling
the algorithm rendering it unable to calculate relevancy properly.

2) You will also not be able to use per field boosting, unless you add boosted
fields to "fields" as well, but then you might as well skip the "_all_*" fields
and pass along the full list of fields instead.

3) The index will be about 3x as big, increasing memory usage. This might not a
huge issue, but could be for us for example as we have several million biblios
and already quite a large index already.

4) To utilize the full power of Elasticsearch one would want to be able to use
different analyzers/normalizers and other useful mapping settings on a per
field basis, and nice query string query options like "quote_field_suffix".
With everyting in one field, all data will be indexed using the same mapping
settings, and features like quote_field_suffix will not work.

I can actually see no benefits with using "all_*" fields, and no real downside
by instead generating a proper "fields" containing all searchable fields. I
begun working on a patch today (one of the reasons was that we need per field
boosting), and it's actually not a very complicated change. Might not be ready
tomorrow, but at least some time in the beginning of next week.

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list