[PATCH] Bug 2246 - (Partial) Map multibyte UTF8 to single byte for ISOLatin1 fonts (fixes diacritics <ASCII 256 decimal)

wajasu matted-34813 at mypacks.net
Wed Oct 5 08:50:34 CEST 2011


This is a partial fix as well, which attempts to convert the internal representation
of multibye UTF8 characters to their single byte in the native encoding (Latin-1)
and allow them to pass through to the PDF stream.

This ONLY fixes those ISOLatin1 diacritics and probably won't solve a full foreign
language need.  It probably solves the printing case for many historical records,
but won't take care of the need for the interational community.  I believe we need to use
a full unicode embedded font in the final solution.

Refer to utf8::downgrade($string,FAIL_OK); (see core perl  /usr/share/perl5/core_perl/utf8.pm)

Test:
a) I selected a biblio that had a udiaresis (u with 2 dots above, i.e. Jurgen Habermas)
and created a batch with the one record.  I used the standard Helvitica font (no truetype)
I exported it as a PDF and saw my label had "J (captial A tilde on top, 1/4) rgen" for
the author.
b) Applied patch.
c) Exported the pdf again, and saw  "J (u with two dots above) rgen"

To see what changed in the PDF that was generated:
a) Edit the label-create-pdf.pl and temporarily comment out the $pdf->Compress(1) line so that
you can see the PDF test instructions when generating. (Export it.)
b) Use a hexfile viewer (I use hexedit), and search for Habermas, and you will see
the corresponding Jurgen with two bytes C3BC before the patch, and FC after the patch.
(You can use od -x in unix to also view the PDF if you don't have hexedit).

Some explanation:
The utf8 flag is turned off, and the FC is passed thru.  I tried Encode:decode routines,
but I think they keep the perl internal utf8 flag on, and the bytes stream out as C3BC.
I've read when strings are concatenated, the flag can switch on/off, so I hoped something
in the PDF::Reuse module would not tern it back on (if thats what is helping).

Observations:
I tried this on my production 3.2 ish koha, and it did't work so this patch is dependent
on other fixes such as http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=4293
I hoped folks in older versions could make the change in production without an upgrade,
but one could try as see since its a staff client tool.

It worked for me 3.4.4 and 3.5.x koho git master  as of October 5th 2011

Things that might be needed:
Pertinent modules?
                        MyTestEnv version       HEAD Required
PDF::API2               2.019                       2                        Yes
PDF::API2::Page         2.019                       2                        Yes
PDF::API2::Simple       1.1.4                       1                        Yes
PDF::API2::Util         2.019                       2                        Yes
PDF::Reuse              0.35                        0.33                     Yes
PDF::Reuse::Barcode     0.05                        0.05                     Yes
PDF::Table              0.9.3                       0.9.3                    Yes
Unicode::Normalize      1.03                        0.32                     Yes

perl -v
This is perl 5, version 12, subversion 1 (v5.12.1) built for x86_64-linux

If you test, be sure to test with a diacritic that has a corresponding ISOLatin1
mapping.
---
 labels/label-create-pdf.pl |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/labels/label-create-pdf.pl b/labels/label-create-pdf.pl
index d45b561..c6daa74 100755
--- a/labels/label-create-pdf.pl
+++ b/labels/label-create-pdf.pl
@@ -88,6 +88,9 @@ sub _print_text {
     foreach my $text_line (@$label_text) {
         my $pdf_font = $pdf->Font($text_line->{'font'});
         my $line = "BT /$pdf_font $text_line->{'font_size'} Tf $text_line->{'text_llx'} $text_line->{'text_lly'} Td ($text_line->{'line'}) Tj ET";
+        # Try to convert multibyte UTF8 to single byte to help std ISOLatin1 font mappings. (turning utf8 flag off)
+        utf8::downgrade($line,1);  # 1 =  FAIL_OK
+
         $pdf->Add($line);
     }
 }
-- 
1.7.7


------=_Part_25_16394214.1317802892961--



More information about the Koha-patches mailing list