[Koha-bugs] [Bug 24123] bulkmarcimport.pl doesn't support UTF-8 encoded MARCXML records

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Thu Jan 9 20:24:25 CET 2020


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=24123

--- Comment #5 from Tomás Cohen Arazi <tomascohen at gmail.com> ---
Created attachment 97138
  -->
https://bugs.koha-community.org/bugzilla3/attachment.cgi?id=97138&action=edit
Bug 24123: Fix import of UTF-8 encoded MARC21 MARCXML using bulkmarcimport
(elastic only)

If elastic is used as search engine, the bulkmarcimport.pl will not
handle correctly UTF-8 encoded MARCXML

Koha::SearchEngine::Search->new uses a require statement to load the correct
Search module.
This is done l.257 of bulkmarcimport.pl:
  257 my $searcher = Koha::SearchEngine::Search->new

Koha::SearchEngine::Elasticsearch::Search will `use MARC::File::XML`, and so
resets the arguments set before:
  216     $MARC::File::XML::_load_args{BinaryEncoding} = 'utf-8';

  220     $MARC::File::XML::_load_args{RecordFormat} = $recordformat;

An easy (but dirty) fix could be to move the declaration of my $searcher before
in the script.
The tricky (but correct) fix would be to remove the long standing "ugly hack
follows" comment.

This patch is the easy, and dirty, fix

Test plan:
Use the command line tool to import MARXCML records that contains unicode
characters into Koha

Something like `misc/migration_tools/bulkmarcimport.pl -biblios -file
record.marcxml -m=MARCXML`

Without this patch you will notice that unicode characters will not be
displayed correctly

Signed-off-by: Michal Denar <black23 at gmail.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen at theke.io>

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list