[Koha-bugs] [Bug 15555] New: Index 024$a into Identifier-other:u url register when source $2 is uri

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Tue Jan 12 05:17:21 CET 2016


http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=15555

            Bug ID: 15555
           Summary: Index 024$a into Identifier-other:u url register when
                    source $2 is uri
 Change sponsored?: ---
           Product: Koha
           Version: master
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P5 - low
         Component: Z39.50 / SRU / OpenSearch Servers
          Assignee: gmcharlt at gmail.com
          Reporter: dcook at prosentient.com.au
        QA Contact: testopia at bugs.koha-community.org
                CC: m.de.rooy at rijksmuseum.nl

Currently, 024$a is indexed into Identifier-other:w, even when it is a URI
(e.g. http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=14217)

This causes problems because the "w" index type replaces punctuation with
spaces, and tokenizes on spaces, so that the URI is decomposed into a series of
values which are indexed separately. This is definitely not what you want when
indexing a 024$a when it is a URI.

For example:

The url "http://libris.kb.se/resource/bib/219553" becomes the following:

<index name="Identifier-other" type="w" seq="28">@^</index>
<index name="Identifier-other" type="w" seq="1"></index>
<index name="Identifier-other" type="w" seq="29">http</index>
<index name="Identifier-other" type="w" seq="30">libris</index>
<index name="Identifier-other" type="w" seq="31">kb</index>
<index name="Identifier-other" type="w" seq="32">se</index>
<index name="Identifier-other" type="w" seq="33">resource</index>
<index name="Identifier-other" type="w" seq="34">bib</index>
<index name="Identifier-other" type="w" seq="35">219553</index>

Fortunately, the 024$2 subfield value tells us the source of the identifier,
and "uri" is one of the valid options. So, when we have a 024$2=uri, we can
index the 024$a using the "url" index type. 

(I'm also planning to index into the "phrase" index type for all 024$a as it
performs the normalization but it doesn't tokenize based on the spaces, so this
normal form may still be of use for urls and other identifiers that rely on
punctuation for meaning.)

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list