[Koha-bugs] [Bug 20589] Add field boosting and use elastic query fields parameter instead of deprecated _all

Tue Nov 20 16:54:53 CET 2018

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=20589

--- Comment #22 from David Gustafsson <glasklas at gmail.com> ---
Here is a more verbose explaction of most of the changes:

- Remove syspref QueryWeightFields as this seems not to be used anyway.

- Replace hard coded 'biblios' and 'authorities' in mappings.pl with constants.

- Add _search_fields method in QueryParser for returning weighted/unweighted
search fields for OPAC/STaffClient (as optional subfield, used primarily for
authorities search).

Refactor authorities search. This part is a a litte bit iffy:

What mainly confuses me is the motivation behind the differentiation between
is/=, and "exact". With regards to elastic that operator only seems to be used
in C4/Matcher.pm. It also occurs in Koha/QueryParser/Driver/PQF.pm and
Koha/REST/Plugin/Query.pm but i don't believe Elastic is used as backend for
those. What I can gather there where actually no difference between is/= and
"exact" before my changes as is/= performed a "term" query on "<field>.phrase"
and "exact" performed a "match_phrase" on the same field, which is pointless
since the "phrase" subfield is not tokenized (used keyword analyser). So I
simplified this and use the same query for both is/= and "exact".

As "exact" queries seems to be used for matching, I also thought it would be
more fitting to perform a case insensitive match instead of also removing
punctuation (which could result in unexpected matches).

I also noticed that the Authorities type drop-down in the authorities search
was not respected, so this is not included in the query if selected.

Also noted there is an $and_or option to build_authorities_query_compat that is
not used, this should probably be addressed, but not fixed in this patch.

Tidy up the index/fields configuration a little bit:

- Rename all occurrences of misspelled "analyser" to "analyzer".
- Rename "my_normalizer" to more descriptive "nfkc_cf_normalizer".
- Rename "normalizer_keyword" to more descriptive "icu_folding_normalizer".
- Rename subfield "lc_raw" to "ci_raw" (case insensitive). I don't think this
subfield actually was case insensitive before since "my_normalizer" was used
wich is only performing UTF normalization, not case folding.
- Don't use "phrase" subfield for sorting, this is incorrect and will result in
strange results since byte order of characters is used. There is also no need
for a subfield for the sort field. I removed the subfield and changed type to
"icu_collation_keyword", which will attempt to sort in order with the least
conflicts between languages. Ideally there should be a syspref for setting
collation language to sort by. This is a pretty trivial fix so might open a new
issue for that.

As a result of this "phrase" is almost not used at all, and could probably be
removed with phrase queries on the search field instead in a later stage (which
would decrease index size considerably).

-- 
You are receiving this mail because:
You are watching all bug changes.