[Koha-bugs] [Bug 28316] Fix ES crashes related to various punctuation characters

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Sat Aug 21 23:08:41 CEST 2021


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=28316

--- Comment #74 from Andrew Nugged <nugged at gmail.com> ---
QueryAutoTruncate case:
=========================

This is a feature that BLINDLY adds to alphanumeric-ended words asterisk
in the end before tossing that phrase to ES as a search query.

So, if QueryAutoTruncate is enabled (default), the Perl part just adds
to the end of any alphanumeric-ended word asterisk, disregards ES syntax
(ranges too), so if we will put such request phrase, the current master branch
(Perl part) will transfer the below examples to ES as a query string (here I
provide "what we entered" in the search field as 'IN:' and what transferred
to ES as 'OUT:' pairs):


(before the patch, phrase 1)

    IN:    The [Book]
    OUT:   The* [Book]

(will make ES fail with a yellow error text box)
in fact, ES doesn't understand your "special language usage"
[Book] confuses ES because not in [... TO ...] form,


and with ranges:

(before the patch, phrase 2)

    IN:    The "Book" {2000 TO 2002} [1900 TO 1990]
    OUT:   The* "Book" "2000 TO 2002" [1900* TO* 1990]

(will make ES fail with a yellow error text box)
[1900* TO* 1990] confuses ES because not in [... TO ...] form,

Here we see that it converted '{' and '}' to '"', and added '*' because of
"QueryAutoTruncate enabled" to all alphanumeric-ended words.
Second even more weird example:


(before the patch, phrase 3)

    IN:    The "Book" [2000 TO 2002}
    OUT:   The* Book* [2000* TO* 2002*

(will make ES fail with a yellow error text box)
[2000* confuses ES because [ has no pair and no range there,
And here quotes get lost because one "}" brace was converted to doublequote,
but because it was only one, so was "unpaired", then the logic of the current
code is to remove quotes at all if they are unbalanced (I agree, but it
shouldn't convert braces to quotes).

All requests above lead to a "yellow" Error text box, which means that ES
fired an exception because of the error in query language syntax (and in Koha
server error log there will be the error message ES 'Failed to parse query').


Anyway, as I see, this "QueryAutoTruncate" mode is designed for very simple
usage behavior, i.e. user expected to do the requests without using any special
language at the most, and give as much as possible results to the user assuming
that there was "just beginnings of the words" requested if one enters:
    so bi
this will be translated by Perl code to
    so* bi*
and passed to ES so ES will match with for examples books named "Something Big"
and "Solved Binary Book"

This QueryAutoTruncate as well ruins the syntax of ES range requests
(that '[1900 TO 1990]' becomes '[1900* TO* 1990]' in ES-language)
so no reason to keep the ranges, and that accounted in the patch:




Now with the patch:

This patch is not intended to make the "full fix", it's not so easy for this,
but at least the patch makes braces in QueryAutoTruncate mode just regular
symbols by pre-slashing them. Same as above but WITH this patch will make such
internal transformations, let's see by example:

(after patch, phrase 1)

    IN:    The "Book" {2000 TO 2002} [1900 TO 1990]
    OUT:   The* \[Book\]

ES understands that here we search 'the* book' and actually even will find
6 results in the current dev-test database, instead of "failing to the error"
like in the example above without the patch,


(after patch, phrase 2)

    IN:    The "Book" {2000 TO 2002} [1900 TO 1990]
    OUT:   The* "Book" \{2000* TO* 2002\} \[1900* TO* 1990\]

ES understands that here we search 'The* "Book" 2000* TO* 2002 1900* TO* 1990'
but it's a pretty rare phrase to find something, so - zero results,
but no "ES exception",

(after patch, phrase 3)

    IN:    The "Book" [2000 TO 2002}
    OUT:   The* "Book" \[2000* TO* 2002\}

Same here as above, zero results but no "exceptions".


I.e. those braces now become not-special-language characters and just passed by
to ES to decide. In real life, those braces ignored by ES as non-alphanumeric,
so querying:
    some [word]
with QueryAutoTruncate enabled after the patch will be the same as search:
    some* word

and BEFORE the patch it was just the "yellow" "error happened" text box result.

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list