[Koha-bugs] [Bug 30996] New: ModBiblio breaks MARC::File::XML

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Mon Jun 20 15:32:17 CEST 2022


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=30996

            Bug ID: 30996
           Summary: ModBiblio breaks MARC::File::XML
 Change sponsored?: ---
           Product: Koha
           Version: 20.11
          Hardware: All
                OS: All
            Status: NEW
          Severity: minor
          Priority: P5 - low
         Component: Architecture, internals, and plumbing
          Assignee: koha-bugs at lists.koha-community.org
          Reporter: magnus at libriotech.no
        QA Contact: testopia at bugs.koha-community.org

This is a weird issue. It could very well be something particular to my setup,
but I am at a loss as to what it might be, so here goes...

This is happening on a regular Koha 20.11.13.000, installed with the Debian
packages, on Ubuntu 20.04.3. 

I have a script that downloads records from the Swedish national library
Libris, does some checking and some massaging, and then imports the records
into Koha with either AddBiblio or ModBiblio. The script can be found here:
https://github.com/Libriotech/ftp2koha

About a month ago messages like these started to show up in the logs for some
of the imported records: 

Wide character in warn at /usr/share/perl5/MARC/Charset.pm line 384.
no mapping found at position 1 in ビードマルク, マッティン, at
/usr/share/perl5/MARC/Charset.pm line 384.

Records that produce this error show broken chars, with chars like äöå shown as
a diamond with a question mark in it. 

The records look ok before they are imported, and tools like yax-marcdump does
not show errors for them. Position 9 in the leader is "a", to indicate the
records are UTF8. Three records in one file are attached. 

I have tried to reduce the problem down to a minimal case, and come up with
this script: 

--------------------------------------------------------------------

#!/usr/bin/perl

use Modern::Perl;
use C4::Biblio qw( ModBiblio );
use MARC::File::XML ( BinaryEncoding => 'utf8', RecordFormat => 'MARC21' );

my $records = MARC::File::XML->in( 'export_2022_06_16_1120_orig.marcxml' );

# Start transaction
my $schema = Koha::Database->new()->schema();
$schema->storage->txn_begin();

my %map = (
    'p4rdqd4nm9f4dhv3' => 349078,
    'jzzcjbt4gt0f0z8w' => 348789,
    '1fmgws1lzrk71k2t' => 349079,
);

RECORD: while ( my $record = $records->next() ) {

    say $record->leader;
    say "=========^==============";

    $record->encoding( 'UTF-8' );

    my $id = $record->field( '001' )->data;
    my $biblionumber = $map{ $id };

    if ( $ARGV[0] > 1 ) {
        say ModBiblio( $record, $biblionumber, '' );
    }

}

# End transaction
$schema->storage->txn_rollback();

--------------------------------------------------------------------

The results of running that script are different depending on if ModBiblio gets
called or not. With "0" as the argument it does not get called and everything
looks ok. Leader pos 9 is "a":

$ sudo koha-shell -c "perl -MCarp::Always test_minimal.pl 0" norrbott
     cim a       7i 4500
=========^==============
     ngm a       7i 4500
=========^==============
     cim a       7i 4500
=========^==============

With "2" as argument, ModBiblio does get called, and there are errors related
to encoding (from the call to $records->next()), from the second and third
record. Notice also that leader pos 9 for record number two and three is now
empty! So when ModBiblio is triggered, the UTF8 indicator is suddenly gone, and
as far as I can see, this is what is causing the encoding errors:

$ sudo koha-shell -c "perl -MCarp::Always test_minimal.pl 2" instancename
     cim a       7i 4500
=========^==============
1
Wide character in warn at /usr/share/perl5/Carp/Always.pm line 18.
no mapping found at position 1 in ビードマルク, マッティン, at
/usr/share/perl5/MARC/Charset.pm line 384.
       
MARC::Charset::utf8_to_marc8("\x{30d2}\x{3099}\x{30fc}\x{30c8}\x{3099}\x{30de}\x{30eb}\x{30af},
\x{30de}\x{30c3}\x{30c6}\x{30a3}\x{30f3},") called at
/usr/share/perl5/MARC/File/XML.pm line 480
        MARC::File::XML::decode(MARC::File::XML=HASH(0x56070fb9c278),
"<record><leader>     ngm a       7i 4500</leader><controlfiel"...) called at
/usr/share/perl5/MARC/File.pm line 110
        MARC::File::next(MARC::File::XML=HASH(0x56070fb9c278)) called at
test_minimal.pl line 19
     ngm         7i 4500
=========^==============
Use of uninitialized value in join or string at /usr/share/perl5/MARC/Field.pm
line 696.
        MARC::Field::as_usmarc(MARC::Field=HASH(0x560718749560)) called at
/usr/share/perl5/MARC/File/USMARC.pm line 274
       
MARC::File::USMARC::_build_tag_directory(MARC::Record=HASH(0x5607186ea458))
called at /usr/share/perl5/MARC/File/USMARC.pm line 313
        MARC::File::USMARC::encode(MARC::Record=HASH(0x5607186ea458)) called at
/usr/share/perl5/MARC/Record.pm line 474
        MARC::Record::as_usmarc(MARC::Record=HASH(0x5607186ea458)) called at
/usr/share/koha/lib/C4/Biblio.pm line 3054
        C4::Biblio::ModBiblioMarc(MARC::Record=HASH(0x560718457c30), 348789,
"") called at /usr/share/koha/lib/C4/Biblio.pm line 381
        C4::Biblio::ModBiblio(MARC::Record=HASH(0x560718457c30), 348789, "")
called at test_minimal.pl line 30
1
     cim         7i 4500
=========^==============
1

Things I have checked: 

- /usr/share/koha/lib/C4/Biblio.pm has not been updated since well before the
problems started
- I can not see any plugins that do suspicious things

Anyone got a hunch what might be causing this, or where to start looking for a
solution?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.


More information about the Koha-bugs mailing list