[Koha-patches] [PATCH] Bug 4828: Clean diacritics from SIP-written messages
Ian Walls
ian.walls at bywatersolutions.com
Thu Feb 17 18:43:16 CET 2011
Non-ASCII characters and information tends to break SIP machines. This patch
scrubs diacritics off any message written out to the SIP client. It won't help
with non-Roman based scripts, but any accent marks will be removed with a Unicode
Normalization.
Based on work described by Dan Scott in his post to open-ils-dev at list.georgialibraries.org
on Jan 04, 2010. http://www.mail-archive.com/open-ils-dev@list.georgialibraries.org/msg04127.html
Tested with 3M SIP emulator, as well as live on two different Koha installs
---
C4/SIP/Sip.pm | 36 +++++++++++++++++++++++++++++++++++-
1 files changed, 35 insertions(+), 1 deletions(-)
diff --git a/C4/SIP/Sip.pm b/C4/SIP/Sip.pm
index 8a0f067..7841d6a 100644
--- a/C4/SIP/Sip.pm
+++ b/C4/SIP/Sip.pm
@@ -9,6 +9,9 @@ use warnings;
use English;
use Exporter;
+use Encode;
+use Unicode::Normalize;
+
use Sys::Syslog qw(syslog);
use POSIX qw(strftime);
use Socket qw(:crlf);
@@ -142,6 +145,37 @@ sub boolspace {
return $bool ? 'Y' : ' ';
}
+sub clean_text {
+ my $text = shift || '';
+
+ # hardcoded to ASCII since Koha configs don't take encoding as institution params
+ my $target_encoding = 'ascii';
+
+ # Convert our incoming UTF8 data into Perl's internal string format
+
+ # Also convert to Normalization Form D, as the ASCII, iso-8859-1,
+ # and latin-1 encodings (at least) require this to substitute
+ # characters rather than simply returning a string truncated
+ # after the first non-ASCII character
+ $text = NFD(decode_utf8($text));
+
+ if ($target_encoding eq 'ascii') {
+
+ # Try to maintain a reasonable version of the content by
+ # stripping diacritics from the text, given that the SIP client
+ # wants just plain ASCII. This is the base requirement according
+ # to the SIP2 specification.
+
+ $text =~ s/\pM+//og;
+ }
+
+ # Characters that cannot be represented in the target encoding will
+ # generally be replaced with a question mark (?) character.
+ $text = encode($target_encoding, $text);
+
+ return $text;
+}
+
# read_SIP_packet($file)
#
@@ -218,7 +252,7 @@ sub write_msg {
my ($self, $msg, $file) = @_;
my $cksum;
- # $msg = encode_utf8($msg);
+ $msg = clean_text($msg);
if ($error_detection) {
if (defined($self->{seqno})) {
$msg .= 'AY' . $self->{seqno};
--
1.5.6.5
More information about the Koha-patches
mailing list