[Koha-bugs] [Bug 18969] Elasticsearch - _all field is deprecated - should use copy_to to prepare for ES6

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Mon Apr 16 17:05:41 CEST 2018


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18969

--- Comment #15 from Nick Clemens <nick at bywatersolutions.com> ---
(In reply to David Gustafsson from comment #14)
> Ok! I actually think that would be a bad idea, mainly for the following
> reasons:
> 
> 1) Elasticsearch uses ranking function called Okapi BM25...If you put all values in one field, average field length
> and inverse document frequency will averaged out based on all fields,

Ah, okay, I see this in the documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html

Built-in or constructed we pay a relevance price


> 2) You will also not be able to use per field boosting, unless you add
> boosted fields to "fields" as well, but then you might as well skip the
> "_all_*" fields and pass along the full list of fields instead.

Well, it does seem to work to only add boosted fields and boost those above
all, but again, not as exact

> 3) The index will be about 3x as big, increasing memory usage. This might
> not a huge issue, but could be for us for example as we have several million
> biblios and already quite a large index already.

Agreed, I think we would need to compare with an without the all field to see
exact impact


> 4) To utilize the full power of Elasticsearch one would want to be able to
> use different analyzers/normalizers and other useful mapping settings on a
> per field basis, and nice query string query options like
> "quote_field_suffix". With everything in one field, all data will be indexed
> using the same mapping settings, and features like quote_field_suffix will
> not work.

I don't think I actually follow you here - we still specify different analyzers
per field, but we also construct the _all field and use that for keyword
searching only - this is what we currently do. So we can search specific
fields, or use the all



> I can actually see no benefits with using "all_*" fields, and no real
> downside by instead generating a proper "fields" containing all searchable
> fields. 
The only downside is listing all the fields individually so a small cost in
construction of queries and query size, but not terrible I would think

>I begun working on a patch today (one of the reasons was that we
> need per field boosting), and it's actually not a very complicated change.
> Might not be ready tomorrow, but at least some time in the beginning of next
> week.

Looking forward to it! :-) - have you seen bug 18316?
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18316

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list