[Koha-patches] [PATCH] Bug 2246 - (Partial) Map multibyte UTF8 to single byte for ISOLatin1 fonts (fixes diacritics <ASCII 256 decimal)

matted-34813 at mypacks.net matted-34813 at mypacks.net
Wed Oct 5 10:21:32 CEST 2011


    Bug 2246 - (Partial) Map multibyte UTF8 to single byte for ISOLatin1 fonts (fixes diacritics <ASCII 256 decimal)
    
    This is a partial fix as well, which attempts to convert the internal representation
    of multibye UTF8 characters to their single byte in the native encoding (Latin-1)
    and allow them to pass through to the PDF stream.
    
    This ONLY fixes those ISOLatin1 diacritics and probably won't solve a full foreign
    language need.  It probably solves the printing case for many historical records,
    but won't take care of the need for the interational community.  I believe we need to use
    a full unicode embedded font in the final solution.
    
    Refer to utf8::downgrade($string,FAIL_OK); (see core perl  /usr/share/perl5/core_perl/utf8.pm)
    
    Test:
    a) I selected a biblio that had a udiaresis (u with 2 dots above, i.e. Jurgen Habermas)
    and created a batch with the one record.  I used the standard Helvitica font (no truetype)
    I exported it as a PDF and saw my label had "J (captial A tilde on top, 1/4) rgen" for
    the author.
    b) Applied patch.
    c) Exported the pdf again, and saw  "J (u with two dots above) rgen"
    
    To see what changed in the PDF that was generated:
    a) Edit the label-create-pdf.pl and temporarily comment out the $pdf->Compress(1) line so that
    you can see the PDF test instructions when generating. (Export it.)
    b) Use a hexfile viewer (I use hexedit), and search for Habermas, and you will see
    the corresponding Jurgen with two bytes C3BC before the patch, and FC after the patch.
    (You can use od -x in unix to also view the PDF if you don't have hexedit).
    
    Some explanation:
    The utf8 flag is turned off, and the FC is passed thru.  I tried Encode:decode routines,
    but I think they keep the perl internal utf8 flag on, and the bytes stream out as C3BC.
    I've read when strings are concatenated, the flag can switch on/off, so I hoped something
    in the PDF::Reuse module would not tern it back on (if thats what is helping).
    
    Observations:
    I tried this on my production 3.2 ish koha, and it did't work so this patch is dependent
    on other fixes such as http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=4293
    I hoped folks in older versions could make the change in production without an upgrade,
    but one could try as see since its a staff client tool.
    
    It worked for me 3.4.4 and 3.5.x koho git master  as of October 5th 2011
    
    Things that might be needed:
    Pertinent modules?
                            MyTestEnv version       HEAD Required
    PDF::API2               2.019                       2                        Yes
    PDF::API2::Page         2.019                       2                        Yes
    PDF::API2::Simple       1.1.4                       1                        Yes
    PDF::API2::Util         2.019                       2                        Yes
    PDF::Reuse              0.35                        0.33                     Yes
    PDF::Reuse::Barcode     0.05                        0.05                     Yes
    PDF::Table              0.9.3                       0.9.3                    Yes
    Unicode::Normalize      1.03                        0.32                     Yes

   perl -v
    This is perl 5, version 12, subversion 1 (v5.12.1) built for x86_64-linux
    
    If you test, be sure to test with a diacritic that has a corresponding ISOLatin1
    mapping.

  


More information about the Koha-patches mailing list