[Koha-devel] Diacriticals, Unicode, and PDF's

Tue Sep 29 03:44:20 CEST 2009

On Mon, Sep 28, 2009 at 09:21:39PM -0400, Chris Nighswonger wrote:
> The UTF to PDF conversion issue appears to be primarily caused by the
> fact that the PDF stream uses glyphIDs rather than unicode to display
> strings. Thus there is not a direct, one-to-one unicode-gliphID
> relationship. The reason that *some* unicode chars come across ok is
> more ascribable to chance than to design. This happens when the
> unicode *happens* to match the font gliphID. What really should be
> happening is that there should be a "ToUnicode" table built and
> embedded in the PDF file so that the relationship from unicode to
> gliphID may be properly defined.

[snip]

> Any thoughts, information, suggestions, etc. is most gratefully appreciated.

The cairographics project has done a lot of work on PDFs and text
to glyph translation, if I remember correctly.

  http://cairographics.org

A google search with these terms is a good start:

  cairo graphics pdf text to glyph

It looks like they rely on pango libraries (something called
pangocairo in particular).

-kolibrie

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: </pipermail/koha-devel/attachments/20090928/1b418f94/attachment-0003.pgp>