[Koha-bugs] [Bug 18336] New: Use utf8mb4 instead of utf8 for MySQL tables, columns, and connections
bugzilla-daemon at bugs.koha-community.org
bugzilla-daemon at bugs.koha-community.org
Mon Mar 27 01:46:41 CEST 2017
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18336
Bug ID: 18336
Summary: Use utf8mb4 instead of utf8 for MySQL tables, columns,
and connections
Change sponsored?: ---
Product: Koha
Version: master
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: P5 - low
Component: Architecture, internals, and plumbing
Assignee: gmcharlt at gmail.com
Reporter: dcook at prosentient.com.au
QA Contact: testopia at bugs.koha-community.org
As noted by myself on the koha-devel listserv, Martin on
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=11944#c247, and Mark
Tompsett on https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=15794#c2,
we might want to use utf8mb4 instead of utf8 for MySQL tables, columns, and
connections.
utf8 in MySQL has a 3 byte limit:
https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8.html.
While most "normal" characters in most languages are covered by 1-3 bytes, UTF8
does allow for 4 bytes, which means we have a problem using lesser used
characters in Chinese, Japanese, and Korean among other languages. It also
means we can't store emoji.
When MySQL encounters a 4 byte UTF8 character, it immediately truncates the
string from that character onward. Unfortunately, it doesn't raise an error. It
raises a warning, which isn't that easy to detect.
In my case, I'm trying to store a MARCXML record with a 4 byte character, and
while C4::Biblio::AddBiblio returns true, MySQL corrupts the XML record when
using utf8 encoding rather than utf8mb4 (both in terms of the MySQL column and
the MySQL connection set by Koha::Database).
--
You are receiving this mail because:
You are watching all bug changes.
More information about the Koha-bugs
mailing list