[Koha-bugs] [Bug 34312] New: Advanced Editor - Rancor - Restore UNIMARC encoding support

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Wed Jul 19 18:58:01 CEST 2023


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=34312

            Bug ID: 34312
           Summary: Advanced Editor - Rancor - Restore UNIMARC encoding
                    support
 Change sponsored?: ---
           Product: Koha
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: critical
          Priority: P5 - low
         Component: Cataloging
          Assignee: koha-bugs at lists.koha-community.org
          Reporter: synapse.ova at gmail.com
        QA Contact: testopia at bugs.koha-community.org
                CC: m.de.rooy at rijksmuseum.nl

Starting from at least version 22.11 opening UNIMARC bib records in advanced
cataloging editor leaves only latin symbols and removes all other symbols
(cyrillic, for example). At the same time, the record is opened normally in
basic editor.

Examples of errors of intranet-error.log:

[Sun Jul 16 18:45:03.590680 2023] [cgi:error] [pid 11663] [client
172.93.185.137:42482] AH01215: Wide character (U+43E) in substitution (s///) at
/usr/lib/x86_64-linux-gnu/perl5/5.36/Template/Filters.pm line 62.:
/usr/share/koha/intranet/cgi-bin/cataloguing/addbiblio.pl, referer:
https://koha.tci.org.ua/cgi-bin/koha/cataloguing/editor.pl
[Sun Jul 16 18:45:24.883138 2023] [cgi:error] [pid 11738] [client
172.93.185.137:42592] AH01215: Wide character in warn at
/usr/share/perl5/MARC/Charset.pm line 308.:
/usr/share/koha/intranet/cgi-bin/svc/bib, referer:
https://koha.tci.org.ua/cgi-bin/koha/cataloguing/editor.pl
[Sun Jul 16 18:45:24.883965 2023] [cgi:error] [pid 11738] [client
172.93.185.137:42592] AH01215: no mapping found for [0x425] at position 0 in
\xd0\xa5\xd0\xb5\xd1\x80\xd1\x81\xd0\xbe\xd0\xbd g0=ASCII_DEFAULT
g1=EXTENDED_LATIN at /usr/share/perl5/MARC/Charset.pm line 308.:
/usr/share/koha/intranet/cgi-bin/svc/bib, referer:
https://koha.tci.org.ua/cgi-bin/koha/cataloguing/editor.pl
[Sun Jul 16 18:45:24.883999 2023] [cgi:error] [pid 11738] [client
172.93.185.137:42592] AH01215: Wide character in warn at
/usr/share/perl5/MARC/Charset.pm line 308.:
/usr/share/koha/intranet/cgi-bin/svc/bib, referer:
https://koha.tci.org.ua/cgi-bin/koha/cataloguing/editor.pl

  Even though the record is in unicode, it is obvious that encoding used by
advanced editor is wrong. Code analysis shows that function marc8_to_utf8 is
used, treating all utf8 symbols as marc8, therefore removing non ASCII
characters.

  Looks like chosing of encoding is hardcoded for MARC21 (checked in record
leader) and now is not compatible with UNIMARC (encoding is put in field 100). 
  Encoding in MARC21 is stored in leader with character 09: " " (blank space)
means coding is MARC-8 and "a" is UCS/Unicode
(https://www.loc.gov/marc/bibliographic/bdleader.html)
  The same leader character 09 in UNIMARC in first editions should be blank
(which is the case for most of our bib records). Since 2012
(https://cdn.ifla.org/wp-content/uploads/files/assets/uca/unimarc_updates/BIBLIOGRAPHIC/u-b_reclabl_update.pdf)
this character is for Type of control: " " (blank space) means no specific type
of control applies to the item being described and "a" is for Archival control.
Since 2016
(https://cdn.ifla.org/wp-content/uploads/files/assets/uca/unimarc_updates/BIBLIOGRAPHIC/b_reclabel_update2016.pdf)
"m" is added for Museum control.
  Therefore it is better to use the same code for discovering encoding as in
basic editor.

== Steps to reproduce the problem ==
The following steps are for UNIMARC Koha even though most likely the result
would be the same on MARC21 as well.
    1) Preliminary step 1. Check that advanced cataloging editor is enabled:
Administration → System preferences → EnableAdvancedCatalogingEditor
    2) Preliminary step 2. Check that bibliographical framework shows leader
(000) and allows to edit it.
    3) Open or prepare a bib record with non ASCII characters. For example, add 
      
αβγδεζηθικλμνξοπρστυφχψωæäåąßćęłńóśøöüźżабвгдежзийклмнопрстуфхцчшщьыъэюяёєїґўі’
       to field 200$a or in other subfield.
    4) Edit leader so that character 09 is blank. For example, paste "     nam0
          4500" (without quotation marks)
    5) Switch to advanced editor. Settings → Switch to advanced editor.
    6) Observe non-ASCII characters are gone

The problem may be temporarily solved by user by changing character 09 in
leader (field 000) to "a".

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.


More information about the Koha-bugs mailing list