[Koha-bugs] [Bug 29333] New: Importing UNIMARC authorities in MARCXML UTF-8 breaks the encoding

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Wed Oct 27 13:48:08 CEST 2021


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=29333

            Bug ID: 29333
           Summary: Importing UNIMARC authorities in MARCXML UTF-8 breaks
                    the encoding
 Change sponsored?: ---
           Product: Koha
           Version: master
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P5 - low
         Component: MARC Authority data support
          Assignee: koha-bugs at lists.koha-community.org
          Reporter: julian.maurice at biblibre.com
        QA Contact: testopia at bugs.koha-community.org

Related to bug 17754

MARC::Record and MARC::File::* modules sometimes use the position 09 of the
leader to detect encoding. A blank character means 'MARC-8' while an 'a' means
'UTF-8'.

In a UNIMARC authority this position is used to store the authority type (see
https://www.transition-bibliographique.fr/wp-content/uploads/2021/02/AIntroLabel-2004.pdf
[FR]).
In this case, 'a' means 'Personal Name'.

The result is that the import will succeed for a Personal Name authority, but
it will fail for all other authority types.

Steps to reproduce:
0. Be sure to have a Koha UNIMARC instance. 
1. Download the MARCXML for "Honoré de Balzac"
   curl -o balzac.marcxml https://www.idref.fr/02670305X.xml
2. Verify that it's encoded in UTF-8
   file balzac.marcxml
   (should output "balzac.marcxml: XML 1.0 document, UTF-8 Unicode text")
3. Go to Tools » Stage MARC for import and import balzac.marcxml with the
following settings:
   Record type: Authority
   Character encoding: UTF-8
   Format: MARCXML
   Do not touch the other settings
4. Once imported, go to the staged MARC management tool and find your batch.
Click on the authority title "Balzac Honoré de 1799-1850" to show the MARC
inside a modal window. There should be no encoding issue.
5. Write down the imported record id (the number in column '#') and go to the
MARC authority editor. Replace all URL parameters by
'breedingid=THE_ID_YOU_WROTE_DOWN'
   The URL should look like this:
/cgi-bin/koha/authorities/authorities.pl?breedingid=198
   You should see no encoding issues. Do not save the record.
6. Import the batch into the catalog. Verify that the authority record has no
encoding issue.
7. Now download the MARCXML for "Athènes (Grèce)"
   curl -o athènes.marcxml https://www.idref.fr/027290530.xml
8. Repeat steps 2 to 6 using athènes.marcxml file. At steps 4 and 5 you should
see encoding issues and that the position 9 of the leader was rewritten from
'c' to 'a'. Strangely, importing this batch fix the encoding issue, but we
still lose the information in position 09 of the leader

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.


More information about the Koha-bugs mailing list