[Koha-bugs] [Bug 17721] New: Do we need utf8_bin collation on tagsubfield?

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Mon Dec 5 13:55:33 CET 2016


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=17721

            Bug ID: 17721
           Summary: Do we need utf8_bin collation on tagsubfield?
 Change sponsored?: ---
           Product: Koha
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P5 - low
         Component: Architecture, internals, and plumbing
          Assignee: gmcharlt at gmail.com
          Reporter: m.de.rooy at rijksmuseum.nl
        QA Contact: testopia at bugs.koha-community.org

Comes from report 17676. This discussion should be on a new report.
===
Comment11
> Indeed we want to keep tagsubfield a utf8_bin (to allow lowercase and uppercase of the same letter for subfields).
Could you provide an example where we want to do that? Is that MARC conform?
And if so(!), why would you need utf8_bin to do so? You can still insert them,
only finding the right a or A would be harder.
It seems to me that we should remove this strange exception, and make sure that
all tagsubfields are saved lowercase. Should not be too hard.
Can MARC::Record handle subfields a and A btw?
But this discussion should be on a new report.

> if ( $table[1] !~ /COLLATE=utf8_unicode_ci/ and $table[1] !~ /COLLATE=utf8mb4_unicode_ci/ ) { #catches utf8mb4 collated tables
This is only 99,9% safe (as you are probably aware of).
If you find one column in unicode_ci, the table might still be something else
(theoretically). Perhaps someone added a custom column with collation?
Since you only change the default here, why not always do it? Replacing X with
X will not be a problem..

Comment12&13 (Katrin)
It's not strictly MARC conform, but we use this a lot and I have talked to
others using it as well. Take the 952 field as the example - all subfield codes
are taken. Using upper case letters works fantastically now and allows you to
store and index data that we got no other sensible spot for. We got it all
working perfectly, why break this feature without need?
Ah, and I think I have encountered upper case in German MARC - just can't find
a documentation right now.
===
Even if we do not change it at all, it would be worth writing somewhere why we
do not. (And where we can still find it later.)

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list