[Koha-bugs] [Bug 2559] Language limit on Spanish returns Russian records or is it English ...

bugzilla-daemon at kohaorg.ec2.liblime.com bugzilla-daemon at kohaorg.ec2.liblime.com
Wed Aug 26 01:05:08 CEST 2009


http://bugs.koha.org/cgi-bin/bugzilla3/show_bug.cgi?id=2559


Nicole C. Engard <nengard at gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |chris at bigballofwax.co.nz
         AssignedTo|nengard at gmail.com           |gmcharlt at gmail.com




--- Comment #4 from Nicole C. Engard <nengard at gmail.com>  2009-08-25 23:05:07 ---
Okay, I poked at this for a bit and found a few big issues.  You should read
the chat below for full details - but here's the short version.  The Language
pull down pulls in the ISO standard for languages - but those do not always
match the MARC or UNIMARC standard - hence the problem.  My suggestion is a new
column (or two if is MARC21 and UNIMARC are different) added to the table with
the standards: language_rfc4646_to_iso639

There are two other tables with the similar codes in it:
language_subtag_registry and language_descriptions  -- I'm not sure all of the
ways these tables are used, but they might be the home for these new fields. 
In short - the problem with Russian and Spanish was a typo - but the other ones
listed are problems in standards comparison.

Standard Links:
* http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt
* http://www.loc.gov/marc/languages/language_code.html
* http://www.unimarc.info/bibliographic/2.3/en/appendixA

CHAT TRANSCRIPT:

Nicole Engard: I am working on this:
http://bugs.koha.org/cgi-bin/bugzilla3/show_bug.cgi?id=2559 and I found the
subtag_registry.sql files and edited them - but I have no idea how to write the
updates cause there doesn't seem to be any type of key
Chris Cormack: hmm
Chris Cormack: will just have to do
Chris Cormack: update blah set language='arm' where language='hy' and something
else 
Chris Cormack: a better fix would be to add a key to that table ;)
Nicole Engard: want me to do that?
Nicole Engard: it's multiple tables
Nicole Engard: and I don't mind adding a key
Nicole Engard: just need to know the rules
Chris Cormack: plain old autoincrement
Nicole Engard: k
Nicole Engard: it's multiple tables - like I said
Nicole Engard: I'll do them all
Nicole Engard: big patch headed Koha's way :)
Nicole Engard: In this table do you know the difference between the two
columns?
Nicole Engard:  select * from language_rfc4646_to_iso639;
+----------------+---------------+----+
| rfc4646_subtag | iso639_2_code 
Nicole Engard: the MARC for armenian is 'arm' not 'hy'
Chris Cormack: well those are 2 standards
Chris Cormack: rfc4646 and is639_2
Chris Cormack: you shouldnt go changing them .. unless its to match the
standard
Nicole Engard: k
Nicole Engard: those i'll leave
Nicole Engard: but i need to change them in the descriptions table
Nicole Engard: for the advanced search page
Nicole Engard: right?
Nicole Engard: and the subtag registry
Chris Cormack: those are standards you will need to look them up
Chris Cormack: and dont change them if they are right
Chris Cormack: we cant rewrite international standards to suit libraries :)
Chris Cormack: so if libraries have made up their own standard (hardly suprises
me)
Nicole Engard: can I search for a patch from galen by date?
Nicole Engard: I want to see how he handled the original fix
Chris Cormack: search on the bugnumber
Nicole Engard: I know he pushed it on 4/27/09
Nicole Engard: ah
Nicole Engard: okay
Chris Cormack:
http://git.koha.org/cgi-bin/gitweb.cgi?p=Koha;a=blobdiff;f=installer/data/mysql/updatedatabase.pl;h=c0af961d36f9f642ee0fe54fbc897aa9396a153d;hp=ae9af90c7dcef7ab37f4e38a495da3ac9cf603ad;hb=073ebc0001f2ea145a2034af2ecf080410cc3ed3;hpb=8a5ac1ee97eece41469bb15d18261373b1d40be2
Chris Cormack: naughty
Chris Cormack: ah well, its broken from the standard now anyway, so you may as
well break it more
Nicole Engard: hehe
Nicole Engard: no
Nicole Engard: wait
Nicole Engard: this looks like it was a typo
Nicole Engard: why would spanish be rus originally?
Chris Cormack: ahh true, good point
Chris Cormack: yep
Nicole Engard: and why would he only change one table?
Nicole Engard: hmmmm
Nicole Engard: i need galen
Chris Cormack: maybe it was right in the other one
Nicole Engard: yeah
Nicole Engard: but this is the one that the search box checks
Nicole Engard: and so if you search for english language only you get nothing
Nicole Engard: the same for armenia and french
Nicole Engard: that's a problem
Nicole Engard: isn't it?
Nicole Engard: even if it is a standard?
Chris Cormack: probably, but those columns are used for more than search
Chris Cormack: they are used for the templates
Chris Cormack: if you go round changing hy to arm, you might bust the ability
of ppl to change to armenian templates
Nicole Engard: hmmm
Chris Cormack: hy-Armn-i-staff-prog-v-3000000.po
Nicole Engard: yeah
Chris Cormack: if you change hy to arm
Chris Cormack: that will bust that
Nicole Engard: I won't change it - I'll just submit a patch with primary keys
for the tables
Chris Cormack: if marc has its own language coes
Chris Cormack: codes
Chris Cormack: we should have a column for those
Chris Cormack: and use them for the search
Nicole Engard: now we're getting out of my reach
Nicole Engard: I think
Chris Cormack: yep, its not as easy as it looks
Nicole Engard: okay found it:
Nicole Engard: <p><label for="language-limit">Language: </label>                
                <select name="limit">
                <option value="">No Limit</option>
                <!-- TMPL_LOOP NAME="search_languages_loop" -->
                <!-- TMPL_IF NAME="selected" -->
                <option value="ln:<!-- TMPL_VAR NAME="iso639_2_code" -->"
selected="selected"><!-- TMPL_VAR NAME="language_description" --></option>
                <!-- TMPL_ELSE -->
                <option value="ln:<!-- TMPL_VAR NAME="iso639_2_code" -->"><!--
TMPL_VAR NAME="language_description" --></option>
                <!-- /TMPL_IF -->

                <!-- /TMPL_LOOP -->
                </select></p><!-- <a href="">Show all languages</a>-->
Nicole Engard: so what we need is a new code - a MARC code?
Nicole Engard: instead of the iso tag
Nicole Engard: I can do something about that - I think
Nicole Engard: maybe just add a column to a table with the MARC language code?
Chris Cormack: or check the standard first
Chris Cormack: but i think its currently right
Nicole Engard: i'll check the standard
Nicole Engard: hmmm
Nicole Engard: it doesn't look like it is
Nicole Engard: http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt
Nicole Engard: armenian is arm
Nicole Engard: hye 
Nicole Engard: or hy
Nicole Engard: so it's a matter of multiple choices
Nicole Engard: arm|hye|hy|Armenian|arménien
Nicole Engard: eng||en|English|anglais
Nicole Engard: same deal
Nicole Engard: and lastly
Nicole Engard: fre|fra|fr|French|français
Chris Cormack: yeah everything has a 2 letter and a 3 letter code
Nicole Engard: no
Nicole Engard: not all
Nicole Engard: fil|||Filipino; Pilipino|filipino; pilipino
Nicole Engard: ira|||Iranian languages|iraniennes, langues
Nicole Engard: for example
Nicole Engard: and Armenian has 2 3 letter codes and 1 2 letter code
Nicole Engard: so
Nicole Engard: got any suggestions?
Nicole Engard: add a marc language code column?
Chris Cormack: id comment on it and say this table is used for 2 things
Chris Cormack: templates
Chris Cormack: and searching
Nicole Engard: http://www.loc.gov/marc/languages/language_code.html
Chris Cormack: we should not be trying to reuse the codes we use for templates,
for searching 
Chris Cormack: (thats what id say in the bug)
Nicole Engard: k
Chris Cormack: now you realise when LOC says marc
Chris Cormack: they mean MARC21
Nicole Engard: yes
Nicole Engard: but we can create a col for marc21 and unimarc
Nicole Engard: and then pull from the right place to fill that pull down
Chris Cormack: id just check what unimarc is doing
Nicole Engard: hang on
Chris Cormack: http://www.unimarc.info/bibliographic/2.3/en/appendixA
Chris Cormack: it might be one column is all we need
Nicole Engard: well eng and arm are the same
Nicole Engard: but we'd have to look through all of the codes and compare them
back and forth
Chris Cormack: yep
Chris Cormack: it was more reading what they were doing
Chris Cormack: A few of the codes in the ISO standard differ from the USMARC
code    list for languages and therefore from the previous version of this
list.
  For the Library of Congress announcement see
http://lcweb.loc.gov/marc/isochange_ann.html.    It is intended that these
differences be adopted for UNIMARC in the near future;    this will make it
ISO-compliant. There will be an announcement in ICBC (International   
Cataloguing and Bibliographic Control) and on the IFLA UBCIM web page
(http://ifla.org/VI/3/ubcim.htm).
Nicole Engard: languages are tough
Nicole Engard: glad it's kind of your area and not mine :)
Chris Cormack: if ppl picked a standard
Chris Cormack: and stuck with it
Chris Cormack: it would be a crapload easier
Chris Cormack: everyone use an iso standard .. done


-- 
Configure bugmail: http://bugs.koha.org/cgi-bin/bugzilla3/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.


More information about the Koha-bugs mailing list