[Koha-devel] Strange encoding problems that sometimes happens

Colin Campbell colin.campbell at ptfs-europe.com
Fri Sep 2 13:20:03 CEST 2011


On 02/09/11 10:04, Paul Poulain wrote:
> Hello,
> 
> English natives will less concerned by this problem probably, but there
> are/were some places where strange encoding problems occurs. We have a
> fix for them (use utf8 decode function), but I was wondering what was
> the origin of this problem.
> 
> I think Frédérick Capovilla has a good explanation that I share with you
> (it's a copy/paste of but 6479)
>> It's been a while since I created that patch but here is what I understand :
>>
>> I remember that in the NewIssue subroutine of Serials.pm, The content of 
>> $serialseq is concatenated with a variable fetched from a SQL query and that's
>> where the problem happen.
>>
>> I did some tests to check the utf-8 flag on those two variables
>> $serialseq = variable from the form (is_utf8 = false)
>> $recievedlist = variable from SQL (is_utf8 = true)
>>
>> The encoding of the data fetched from SQL differs from the encoding of the data
>> received from the form. If two string with a different encoding gets
>> concatenated together, the encoding of one of the string is automatically
>> changed, and we get an encoding problem on one half of the string.
>>
>> Using decode on $serialseq adds the utf-8 flag, so we don't get an encoding
>> problem when we concatenate.
>>
>>
>> We don't get this encoding problem when the first item is added in
>> $recievedlist because $recievedlist is empty and doesn't automatically get the
>> utf-8 flag. I'm guessing Perl DBI automatically addes the utf-8 flag if it
>> finds utf-8 characters in a string returned from a SELECT.
> If there are remaining places where this problem occurs, we know why &
> how to fix it then !
> 
> PS: This explanation sounds highly logical to me. But if someone
> disagree or has another explanation, feel free to drop a comment on the
> bug or continue this thread !
> 
The perldoc for CGI.pm recommends using decode on input you need to be
in utf-8.
C.

-- 
Colin Campbell
Chief Software Engineer,
PTFS Europe Limited
Content Management and Library Solutions
+44 (0) 845 557 5634 (phone)
+44 (0) 7759 633626  (mobile)
colin.campbell at ptfs-europe.com
skype: colin_campbell2

http://www.ptfs-europe.com


More information about the Koha-devel mailing list