I18n (was Re: [Koha-devel] From the Kaitiaki)

Mon Sep 16 10:14:14 CEST 2002

On Mon, 16 Sep 2002, Pat Eyler wrote:
> On Sat, 14 Sep 2002, Ambrose Li wrote:
> > The library I work with is also in such a situation; we have
> > books in Chinese and English, plus a few French books. And a lot
> > of books are in both Chinese and English (Chinese and English
> > titles and/or content).
> >
> > I thought New Zealand also have multilingual collections; how is
> > Koha handling such collections in New Zealand?
>
> I think that this is one of the areas where Koha can begin to shine.  It
> will take some work to get there though.  Are there any m17n/i18n/l10n
> gurus out there who want to start thinking about what we need to do to get
> there?

	I'm not an i18n guru by any means, but I do have books in English,
French, German, and Russian. I can think of a number of issues to think
about:

	- If I look up a French book, it'd be nice to have the author and
title appear in proper French, with all of the accents in the right
places.
	- A multi-lingual book might have titles in several languages,
e.g., "English-Russian Dictionary" also has the title "Anglo-russkiy
slovar'" (except that the latter would be rendered in cyrillic, which I
won't do in email). It should be possible to search for either one.
	- For that matter, a book might have several variants of the title
in multiple languages. Terry Pratchett's "The Colour of Magic" is
published in the US as "The Color of Magic."
	- A book might have only one title (say, in French), but the
librarian who's looking it up doesn't know how to enter accents and
whatnot. (Plus, the accents aren't always present: in "A la recherche du
temps perdu", there would normally be an accent over the A, but it's
capitalized, so there's no accent. German allows you to write "ue" instead
of u-with-an-umlaut. Librarians shouldn't need to know these subtleties.
	- Even if a book has only one title, the librarian might want to
search for it in English, i.e., search for the literal string "slovar"
instead of the cyrillic string that's rendered as "slovar'" in ASCII.

	Here's one possible solution: come up with a notation to specify
the language and encoding of a (sub)string, e.g.:
	=|<language>|<character-set>|<string>|=
Thus, one could have
	A critical analysis of =|es|iso8859-1|Don Quixote|=
and some similar notation for indicating variant (but equivalent) forms of
a title:
	=|en_GB|iso8859-1|The colour of magic|=
	=|en_US|iso8859-1|The color of magic|=
(Obviously, this can be used for authors, notes, etc., and not just
titles.)

	This should take care of the problem of representing non-ASCII
strings for the benefit of patrons who can read non-ASCII languages. The
next problem is, what if a patron asks a librarian for "Anglo-russkiy
slovar'". The librarian should be able to search for the literal string
"slovar".
	This is a thornier problem. The best solution I've been able to
come up with is to add a field to the database that gives the title in
plain ASCII (or whatever the library considers to be native). Thus, when a
new book is entered, remove any accents and transliterate any non-ASCII
characters to ASCII. Store this as the "ASCII title." (Obviously, in many
cases, this will be the same as the title; in this case, just leave the
"ASCII title" field NULL.)

	The final i18n issue is perhaps the simplest: that of setting up
the web interface to use the user's preferred language. I believe the
browser is expected to send a list of languages in which it will accept
results, so this can be extracted from the headers. In addition, Apache
can automagically pick a document to return. That is, if the browser asks
for "/index.html" and specifies that it'll accept languages "fr" and "en",
then Apache will see if there's a file "/index.html.fr".
	For strings in scripts and such, the main thing is to identify and
mark them. GNU's gettext package allows you to mark translatable strings
as _(<string>) or L_(<string>) (though I'm not sure this would work
terribly well in Perl). These strings can then be collected and maintained
by a script. See, for instance,
http://cvs.coldsync.org/cgi-bin/viewcvs.cgi/coldsync/i18n/Makefile?rev=1.18&content-type=text/vnd.viewcvs-markup

	Okay, I'll shut up now.

-- 
Andrew Arensburger                      Actually, these _do_ represent the
arensb at ooblick.com                      opinions of ooblick.com!
                        Generic Tagline V 6.01