[Koha-bugs] [Bug 26472] New: Elasticsearch - ES - Authority record results not ordered correctly due to punctuation marks

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Wed Sep 16 17:07:45 CEST 2020


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=26472

            Bug ID: 26472
           Summary: Elasticsearch - ES - Authority record results not
                    ordered correctly due to punctuation marks
 Change sponsored?: ---
           Product: Koha
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: minor
          Priority: P5 - low
         Component: Searching - Elasticsearch
          Assignee: koha-bugs at lists.koha-community.org
          Reporter: heather_hernandez at nps.gov

When searching authorities, the results are grouped by punctuation, e.g.:

Word (Word)
Word Word
Word-Word
Word, Word

For example:

Saint (Fictitious...
Saint Charles School
Saint Hilaire Institute
Saint Patrick Club
Saint-Nazaire (France)
Saint-Pierre (Martinique)
Saint, Eva Marie
Saint, Joseph

But it should sort like this:
Saint Charles School
Saint, Eva Marie
Saint (Fictitious...
Saint Hilaire Institute
Saint Patrick Club
Saint-Nazaire (France)
Saint-Pierre (Martinique)

There is documentation of this filing order in _Library of Congress Filing
Rules_ which can be freely accessed here: 
https://babel.hathitrust.org/cgi/pt?id=mdp.39015022080140&view=1up&seq=3

Especially in section 1, General Filing Order:
---Begin quote:
Fields in a filing entry are arranged word by word, and words are arranged
character by character. This procedure is continued until one of the following
occurs:
a. A prescribed filing position is reached.
b. The field comes to an end (in which case placement is determined by another
field of the entry or by applying one of the rules given hereafter).
c. A mark of punctuation indicating a subarrangement is encountered.
1.1. Order of Letters
Letters are arranged according to the order of the English alphabet (A-Z).
Upper and lower case letters have equal filing value.
1.1.1. Modified Letters
Modified letters are treated like their plain equivalents in the English
alphabet. Thus all diacritical marks and modifications of recognizable English
letters are treated as if they did not exist; e.g., ä, á, å, ł , ñ, ø are filed
as a, 1, n, o. The treatment of special letters that cannot be readily equated
with English letters is described in Rule 17.
Example
Hand blows
Hand book for Prospect Park
Hand in glove
Håndbok for sangere
Handbook for adventure
Hände am Pflug
Hands on the past
Haṇḍu   [Indic surname]
1.2. Placement of Numerals
Numbers expressed in digits or other notation (e.g., roman numerals) precede
letters and, with few exceptions, they are arranged according to their
numerical value. According to this rule, all filing entries beginning with
numerals appear before entries beginning with the letter A. Numbers expressed
as words are filed alphabetically. Detailed instructions for filing numerals
are given in Rule 16.
Example
1, 2, 3, and more
1, 2, buckle my shoe
3 died variously
10 ways to become rich
13 jolly saints
112 Elm Street
838 ways to amuse a child
1000 spare time money making ideas
1984
10,000 trade names
1,000,000 delinquents
A is for anatomy
A4D desert speed run
Aa, Abraham
Henry II
Henry 3
Henry VIII and his six wives
Henry Street Settlement, New York
Henry the Fourth, part two
Longitude 30 west
Longitude and time
Nineteen eighty-four
Oberlin College
one, two, three for fun
Rubinstein, Moshe F.
Ten thousand miles on a bicycle
Three 14th century English mystics
Three by Tey
Thucydides
1.3. Signs and Symbols
Nonalphabetic signs and symbols within a field are generally ignored in filing
and the remaining letters or numerals are used as the basis of arrangement (see
also Rule 18).
1.3.1. Punctuation
Punctuation as such has no place in the collating sequence of characters
considered in filing arrangement. A mark of punctuation is taken into account,
however, in two situations: 1) when it signals the end of an element or field
and indicates the need for subarrangement as described in the following rules;
and 2) when it serves as the sole separator between two discrete words (e.g.,
Mott-Smith; 1951/1952; 1:3) and so must be treated as equivalent to a space.
The second situation dictates that a hyphen will always be treated as a space
(see also Rules 12 and 16).
---End quote

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.


More information about the Koha-bugs mailing list