[Koha-bugs] [Bug 29333] New: Importing UNIMARC authorities in MARCXML UTF-8 breaks the encoding
bugzilla-daemon at bugs.koha-community.org
bugzilla-daemon at bugs.koha-community.org
Wed Oct 27 13:48:08 CEST 2021
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=29333
Bug ID: 29333
Summary: Importing UNIMARC authorities in MARCXML UTF-8 breaks
the encoding
Change sponsored?: ---
Product: Koha
Version: master
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: P5 - low
Component: MARC Authority data support
Assignee: koha-bugs at lists.koha-community.org
Reporter: julian.maurice at biblibre.com
QA Contact: testopia at bugs.koha-community.org
Related to bug 17754
MARC::Record and MARC::File::* modules sometimes use the position 09 of the
leader to detect encoding. A blank character means 'MARC-8' while an 'a' means
'UTF-8'.
In a UNIMARC authority this position is used to store the authority type (see
https://www.transition-bibliographique.fr/wp-content/uploads/2021/02/AIntroLabel-2004.pdf
[FR]).
In this case, 'a' means 'Personal Name'.
The result is that the import will succeed for a Personal Name authority, but
it will fail for all other authority types.
Steps to reproduce:
0. Be sure to have a Koha UNIMARC instance.
1. Download the MARCXML for "Honoré de Balzac"
curl -o balzac.marcxml https://www.idref.fr/02670305X.xml
2. Verify that it's encoded in UTF-8
file balzac.marcxml
(should output "balzac.marcxml: XML 1.0 document, UTF-8 Unicode text")
3. Go to Tools » Stage MARC for import and import balzac.marcxml with the
following settings:
Record type: Authority
Character encoding: UTF-8
Format: MARCXML
Do not touch the other settings
4. Once imported, go to the staged MARC management tool and find your batch.
Click on the authority title "Balzac Honoré de 1799-1850" to show the MARC
inside a modal window. There should be no encoding issue.
5. Write down the imported record id (the number in column '#') and go to the
MARC authority editor. Replace all URL parameters by
'breedingid=THE_ID_YOU_WROTE_DOWN'
The URL should look like this:
/cgi-bin/koha/authorities/authorities.pl?breedingid=198
You should see no encoding issues. Do not save the record.
6. Import the batch into the catalog. Verify that the authority record has no
encoding issue.
7. Now download the MARCXML for "Athènes (Grèce)"
curl -o athènes.marcxml https://www.idref.fr/027290530.xml
8. Repeat steps 2 to 6 using athènes.marcxml file. At steps 4 and 5 you should
see encoding issues and that the position 9 of the leader was rewritten from
'c' to 'a'. Strangely, importing this batch fix the encoding issue, but we
still lose the information in position 09 of the leader
--
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
More information about the Koha-bugs
mailing list