[Koha-bugs] [Bug 35659] OAI Harvester

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Fri Mar 1 21:24:43 CET 2024


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=35659

Michal Denar <black23 at gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #162528|0                           |1
        is obsolete|                            |

--- Comment #36 from Michal Denar <black23 at gmail.com> ---
Created attachment 162694
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=162694&action=edit
Bug 35659: (follow-up) Better handling of accented characters

If you try to harvest bibliographic records from a UNIMARC OAI
repository (using oai_dc data format) in a MARC21 Koha instance
and run the OAI harvester script in verbose mode, you may get
lines similar to the following in the output:

no mapping found for [0xC9] at position 0 in Économie politique
g0=ASCII_DEFAULT g1=EXTENDED_LATIN at /usr/share/perl5/MARC/Charset.pm line
308.
no mapping found for [0xC9] at position 0 in Église et société g0=ASCII_DEFAULT
g1=EXTENDED_LATIN at /usr/share/perl5/MARC/Charset.pm line 308.

When looking at the imported records' biblio details page in
the OPAC, most words containing accented characters will not
appear correctly.

The fix is to apply Franck Theeten's solution from Bug 16488
(https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=16488#c24)
and modify the value of the MARC leader's 10th character
to 'a' in the XSLT that transforms the UNIMARC OAI records
into MARC21 XML. Then, the accented characters get imported
properly and the records appear correctly in the OPAC.

Test plan:

0) Without this patch, running the OAI harvesting script in
   verbose mode produces many warnings, and garbled characters
   appear in the OPAC biblio details page wherever accented
   characters are in use.

1) Apply this patch.

2) Re-run the OAI harvesting script in verbose + force mode
   (force mode is required to ignore record datestamps from
   previous runs):

   misc/cronjobs/harvest_oai.pl -v -r <OAI_REPO_ID> -f

   This time there should be no warnings printed on your
   screen, and any characters with accents in the updated
   records should look OK in the OPAC.

Thanks-to: Franck Theeten <franck.theeten at africamuseum.be>
Signed-off-by: Michal Denar <black23 at gmail.com>

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list