[Koha-devel] Testing XML parsers

Clay Fouts cfouts at ptfs.com
Wed Apr 28 21:45:21 CEST 2010


[Moving to the more topical koha-devel]

For a bit of anecdotal evidence, we've been testing with the non-Parser
version for over six months with no ill effects. However, I definitely
encourage further, more rigorous compatibility tests before making a
recommendation. How do we go about testing that formally?

Also, even though $KOHA_CONF is parsed in each and every script, the time
for a single snarfing of that file is immaterial compared to that required
for parsing the 20 or so MARCXML records and corresponding XSLT templates
that pop up in single OPAC search results page.

Clay


On Tue, Apr 27, 2010 at 4:47 PM, Chris Cormack <chrisc at catalyst.net.nz>wrote:

> * Clay Fouts (cfouts at ptfs.com) wrote:
> >    I agree ParserDetails.ini is also not a good place to define it since
> it
> >    forces a system-wide preference. But wouldn't adding it to
> C4/Context.pm
> >    be sufficient since everything else uses that?
> >    Also, can anyone point to a verifiable difference in correct
> functionality
> >    between the Parser and non-Parser LibXML::SAX modules? They're built
> off
> >    of the same code, the only difference being that the Parser takes
> extra
> >    time to build a DOM tree, which Koha doesn't use. The expense of
> building
> >    that tree is considerable.
>
> I'll have to defer to the people who specified the dependency on that.
>
> Well actually I'll to talk to some others, because that person threw his
> toys
> out of the sandpit a while ago.  The dependency has been their since
> 2007, it may well be that it's time to revisit it. However this close to
> 3.2 release that probably will be needed to be backburnered until 3.4.
> My main desire was to not have the 3.2.0 release needing to change the
> parserdetails.ini file like the 3.0.x ones have.
>
> C4::Context would override it everywhere in Koha, not just where we
> throw around MARCXML. So as it stands in the files that actually use
> MARC::File::XML, it does mean that you use LibXML::SAX non parser to
> parse the config file (faster) and the one that we think is the only
> safe one, to parse the MARCXML
>
> By removing it from the parserdetails.ini and allowing C4::Context to
> parse koha-conf.xml with a faster parser we do get a performance win on
> every single page.
>
> Sorry I should have explained better.
>
> Chris
> >
> >    On Tue, Apr 27, 2010 at 4:04 PM, Chris Cormack <
> chrisc at catalyst.net.nz>
> >    wrote:
> >
> >      * Clay Fouts (cfouts at ptfs.com) wrote:
> >      >    Hi, Chris.
> >      >    Wouldn't it make more sense to define this in a single place?
> If a
> >      user
> >      >    decides at some point in the future to swap parsers, they'd
> have to
> >      change
> >      >    dozens of files to remain consistent.
> >      >    Also, not everyone uses XML::LibXML::SAX::Parser, so it seems
> >      >    inappropriate to hard code that into the source. For example,
> >      >    XML::LibXML::SAX is much faster and functionally equitable for
> >      Koha's
> >      >    purposes.
> >      >    Cheers,
> >      >    Clay
> >
> >      Hi Clay
> >
> >      The MARC editing Koha does, does depend on XML::LibXML::SAX::Parser,
> the
> >      current situation means people have to edit parserdetails.ini  to
> make
> >      that the default. This means that now everything on the system uses
> it.
> >      Bad ... In the packaging work Lars is doing, there is no way that
> doing
> >      it that way that will make it into debian.
> >
> >      No one should use Koha with other than XML::LibXML::SAX::Parser
> because
> >      although the others maybe functionally equitable, things break with
> >      them, specifically in MARC::File::XML
> >
> >      So since Koha depends on this parser, it makes more sense to me to
> >      define it in the files that depend on it. Thus allowing the user to
> use
> >      whatever else Parse they look for the rest of their system.
> >
> >      It also allows the debian packaging to continue.
> >
> >      The other option is of course to add this to MARC::File::XML
> instead.
> >
> >      Hope this helps explain
> >
> >      Chris
> >      >
> >      >    On Tue, Apr 27, 2010 at 2:34 PM, Chris Cormack
> >      <chrisc at catalyst.net.nz>
> >      >    wrote:
> >      >
> >      >      ---
> >      >       C4/Biblio.pm                                       |    2 ++
> >      >       C4/Record.pm                                       |    4
> ++++
> >      >       cataloguing/addbiblio.pl                           |    3
> +++
> >      >       cataloguing/additem.pl                             |    4
> ++++
> >      >       cataloguing/moveitem.pl                            |    2
> ++
> >      >       installer/data/mysql/update22to30.pl               |    1 +
> >      >       installer/data/mysql/updatedatabase.pl             |    2
> ++
> >      >       misc/batchImportMARCWithBiblionumbers.pl           |   17
> >      >      +++++++++++++++++
> >      >       misc/batchupdateISBNs.pl                           |    2 ++
> >      >       misc/maintenance/MARC21_utf8_flag_fix.pl           |    3
> +++
> >      >       .../migration_tools/22_to_30/export_Authorities.pl |    2 ++
> >      >       .../22_to_30/export_Authorities_xml.pl             |    2 ++
> >      >       .../22_to_30/move_marc_to_biblioitems.pl           |    2
> ++
> >      >       misc/migration_tools/bulkauthimport.pl             |    2
> ++
> >      >       misc/migration_tools/bulkmarcimport.pl             |    2
> ++
> >      >       misc/sax_parser_print.pl                           |    1 +
> >      >       misc/sax_parser_test.pl                            |    2
> ++
> >      >       17 files changed, 53 insertions(+), 0 deletions(-)
> >      >
> >      >      diff --git a/C4/Biblio.pm b/C4/Biblio.pm
> >      >      index e24bee8..24ba2f1 100644
> >      >      --- a/C4/Biblio.pm
> >      >      +++ b/C4/Biblio.pm
> >      >      @@ -17,6 +17,8 @@ package C4::Biblio;
> >      >       # with Koha; if not, write to the Free Software Foundation,
> >      Inc.,
> >      >       # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> USA.
> >      >
> >      >      +BEGIN { $XML::SAX::ParserPackage =
> "XML::LibXML::SAX::Parser"; }
> >      # Make
> >      >      sure MARC::File::XML uses the correct parser
> >      >      +
> >      >       use strict;
> >      >       use warnings;
> >      >
> >      >      diff --git a/C4/Record.pm b/C4/Record.pm
> >      >      index 233fa6e..28903ee 100644
> >      >      --- a/C4/Record.pm
> >      >      +++ b/C4/Record.pm
> >      >      @@ -2,6 +2,7 @@ package C4::Record;
> >      >       #
> >      >       # Copyright 2006 (C) LibLime
> >      >       # Joshua Ferraro <jmf at liblime.com>
> >      >      +# Copyright 2010 Catalyst IT Limited
> >      >       #
> >      >       # This file is part of Koha.
> >      >       #
> >      >      @@ -19,6 +20,9 @@ package C4::Record;
> >      >       # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> USA.
> >      >       #
> >      >       #
> >      >      +
> >      >      +BEGIN { $XML::SAX::ParserPackage =
> "XML::LibXML::SAX::Parser"; }
> >      # Make
> >      >      sure MARC::File::XML uses the correct parser
> >      >      +
> >      >       use strict;
> >      >       #use warnings; FIXME - Bug 2505
> >      >
> >      >      diff --git a/cataloguing/addbiblio.pl b/cataloguing/
> addbiblio.pl
> >      >      index 7a1ed65..af73822 100755
> >      >      --- a/cataloguing/addbiblio.pl
> >      >      +++ b/cataloguing/addbiblio.pl
> >      >      @@ -2,6 +2,7 @@
> >      >
> >      >       # Copyright 2000-2002 Katipo Communications
> >      >      +# Copyright 2010 Catalyst IT Limited
> >      >       #
> >      >       # This file is part of Koha.
> >      >       #
> >      >      @@ -18,6 +19,8 @@
> >      >       # with Koha; if not, write to the Free Software Foundation,
> >      Inc.,
> >      >       # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> USA.
> >      >
> >      >      +BEGIN { $XML::SAX::ParserPackage =
> "XML::LibXML::SAX::Parser"; }
> >      # Make
> >      >      sure MARC::File::XML uses the correct parser
> >      >      +
> >      >       use strict;
> >      >       #use warnings; FIXME - Bug 2505
> >      >       use CGI;
> >      >      diff --git a/cataloguing/additem.pl b/cataloguing/additem.pl
> >      >      index bd3876e..40ccba9 100755
> >      >      --- a/cataloguing/additem.pl
> >      >      +++ b/cataloguing/additem.pl
> >      >      @@ -2,6 +2,7 @@
> >      >
> >      >       # Copyright 2000-2002 Katipo Communications
> >      >      +# Copyright 2010 Catalyst IT Limited
> >      >       #
> >      >       # This file is part of Koha.
> >      >       #
> >      >      @@ -18,6 +19,9 @@
> >      >       # with Koha; if not, write to the Free Software Foundation,
> >      Inc.,
> >      >       # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> USA.
> >      >
> >      >      +
> >      >      +BEGIN { $XML::SAX::ParserPackage =
> "XML::LibXML::SAX::Parser"; }
> >      # Make
> >      >      sure MARC::File::XML uses the correct parser
> >      >      +
> >      >       use strict;
> >      >       #use warnings; FIXME - Bug 2505
> >      >       use CGI;
> >      >      diff --git a/cataloguing/moveitem.pl b/cataloguing/
> moveitem.pl
> >      >      index e0ab551..31b0093 100755
> >      >      --- a/cataloguing/moveitem.pl
> >      >      +++ b/cataloguing/moveitem.pl
> >      >      @@ -19,6 +19,8 @@
> >      >       # with Koha; if not, write to the Free Software Foundation,
> >      Inc.,
> >      >       # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> USA.
> >      >
> >      >      +BEGIN { $XML::SAX::ParserPackage =
> "XML::LibXML::SAX::Parser"; }
> >      # Make
> >      >      sure MARC::File::XML uses the correct parser
> >      >      +
> >      >       use strict;
> >      >       #use warnings; FIXME - Bug 2505
> >      >       use CGI;
> >      >      diff --git a/installer/data/mysql/update22to30.pl
> >      >      b/installer/data/mysql/update22to30.pl
> >      >      index cb4379b..f314749 100755
> >      >      --- a/installer/data/mysql/update22to30.pl
> >      >      +++ b/installer/data/mysql/update22to30.pl
> >      >      @@ -11,6 +11,7 @@
> >      >       # - Would also be a good idea to offer to do a backup at
> this
> >      time...
> >      >
> >      >       # NOTE:  If you do something more than once in here, make it
> >      table
> >      >      driven.
> >      >      +BEGIN { $XML::SAX::ParserPackage =
> "XML::LibXML::SAX::Parser"; }
> >      # Make
> >      >      sure MARC::File::XML uses the correct parser
> >      >       use strict;
> >      >       #use warnings; FIXME - Bug 2505
> >      >
> >      >      diff --git a/installer/data/mysql/updatedatabase.pl
> >      >      b/installer/data/mysql/updatedatabase.pl
> >      >      index 201e30c..8a2c15d 100755
> >      >      --- a/installer/data/mysql/updatedatabase.pl
> >      >      +++ b/installer/data/mysql/updatedatabase.pl
> >      >      @@ -14,6 +14,8 @@
> >      >
> >      >       # NOTE: Please keep the version in kohaversion.plup-to-date!
> >      >
> >      >      +BEGIN { $XML::SAX::ParserPackage =
> "XML::LibXML::SAX::Parser"; }
> >      # Make
> >      >      sure MARC::File::XML uses the correct parser
> >      >      +
> >      >       use strict;
> >      >       use warnings;
> >      >
> >      >      diff --git a/misc/batchImportMARCWithBiblionumbers.pl
> >      >      b/misc/batchImportMARCWithBiblionumbers.pl
> >      >      index 4acc02b..8ca8a34 100755
> >      >      --- a/misc/batchImportMARCWithBiblionumbers.pl
> >      >      +++ b/misc/batchImportMARCWithBiblionumbers.pl
> >      >      @@ -1,6 +1,22 @@
> >      >       #!/usr/bin/perl
> >      >       # load records that already have biblionumber set into a
> koha
> >      system
> >      >       # Written by TG on 10/04/2006
> >      >      +
> >      >      +# This file is part of Koha.
> >      >      +#
> >      >      +# Koha is free software; you can redistribute it and/or
> modify
> >      it under
> >      >      the
> >      >      +# terms of the GNU General Public License as published by
> the
> >      Free
> >      >      Software
> >      >      +# Foundation; either version 2 of the License, or (at your
> >      option) any
> >      >      later
> >      >      +# version.
> >      >      +#
> >      >      +# Koha is distributed in the hope that it will be useful,
> but
> >      WITHOUT
> >      >      ANY
> >      >      +# WARRANTY; without even the implied warranty of
> MERCHANTABILITY
> >      or
> >      >      FITNESS FOR
> >      >      +# A PARTICULAR PURPOSE.  See the GNU General Public License
> for
> >      more
> >      >      details.
> >      >      +#
> >      >      +# You should have received a copy of the GNU General Public
> >      License
> >      >      along
> >      >      +# with Koha; if not, write to the Free Software Foundation,
> >      Inc.,
> >      >      +# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> USA.
> >      >      +
> >      >       use strict;
> >      >       #use warnings; FIXME - Bug 2505
> >      >       BEGIN {
> >      >      @@ -8,6 +24,7 @@ BEGIN {
> >      >          # test carefully before changing this
> >      >          use FindBin;
> >      >          eval { require "$FindBin::Bin/kohalib.pl" };
> >      >      +    $XML::SAX::ParserPackage = "XML::LibXML::SAX::Parser";
> >      >       }
> >      >
> >      >       # Koha modules used
> >      >      diff --git a/misc/batchupdateISBNs.pl
> b/misc/batchupdateISBNs.pl
> >      >      index 8cc8c6a..6041b80 100755
> >      >      --- a/misc/batchupdateISBNs.pl
> >      >      +++ b/misc/batchupdateISBNs.pl
> >      >      @@ -30,6 +30,8 @@ BEGIN {
> >      >          # test carefully before changing this
> >      >          use FindBin;
> >      >          eval { require "$FindBin::Bin/kohalib.pl" };
> >      >      +    $XML::SAX::ParserPackage = "XML::LibXML::SAX::Parser";
> >      >      +
> >      >       }
> >      >       use C4::Context;
> >      >       use MARC::File::XML;
> >      >      diff --git a/misc/maintenance/MARC21_utf8_flag_fix.pl
> >      >      b/misc/maintenance/MARC21_utf8_flag_fix.pl
> >      >      index e7e9156..fbff241 100755
> >      >      --- a/misc/maintenance/MARC21_utf8_flag_fix.pl
> >      >      +++ b/misc/maintenance/MARC21_utf8_flag_fix.pl
> >      >      @@ -1,6 +1,7 @@
> >      >       #!/usr/bin/perl
> >      >       #
> >      >       # Copyright 2009 Liblime
> >      >      +# Copyright 2010 Catalyst IT
> >      >       #
> >      >       # This file is part of Koha.
> >      >       #
> >      >      @@ -17,6 +18,8 @@
> >      >       # with Koha; if not, write to the Free Software Foundation,
> >      Inc.,
> >      >       # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> USA.
> >      >
> >      >      +BEGIN { $XML::SAX::ParserPackage =
> "XML::LibXML::SAX::Parser"; }
> >      # Make
> >      >      sure correct sax parser is used
> >      >      +
> >      >       use strict;
> >      >       use warnings;
> >      >
> >      >      diff --git
> a/misc/migration_tools/22_to_30/export_Authorities.pl
> >      >      b/misc/migration_tools/22_to_30/export_Authorities.pl
> >      >      index aeed3f2..cd24a7d 100755
> >      >      --- a/misc/migration_tools/22_to_30/export_Authorities.pl
> >      >      +++ b/misc/migration_tools/22_to_30/export_Authorities.pl
> >      >      @@ -6,6 +6,8 @@ BEGIN {
> >      >          # test carefully before changing this
> >      >          use FindBin;
> >      >          eval { require "$FindBin::Bin/../../kohalib.pl" };
> >      >      +    $XML::SAX::ParserPackage = "XML::LibXML::SAX::Parser"; #
> >      Make sure
> >      >      correct sax parser is used
> >      >      +
> >      >       }
> >      >       use C4::Context;
> >      >       #use MARC::File::XML(BinaryEncoding=>"utf8");
> >      >      diff --git
> >      a/misc/migration_tools/22_to_30/export_Authorities_xml.pl
> >      >      b/misc/migration_tools/22_to_30/export_Authorities_xml.pl
> >      >      index 70647ee..bd871c4 100755
> >      >      --- a/misc/migration_tools/22_to_30/export_Authorities_xml.pl
> >      >      +++ b/misc/migration_tools/22_to_30/export_Authorities_xml.pl
> >      >      @@ -6,6 +6,8 @@ BEGIN {
> >      >          # test carefully before changing this
> >      >          use FindBin;
> >      >          eval { require "$FindBin::Bin/../../kohalib.pl" };
> >      >      +    $XML::SAX::ParserPackage = "XML::LibXML::SAX::Parser";
>  #
> >      Make sure
> >      >      correct sax parser is used
> >      >      +
> >      >       }
> >      >       use C4::Context;
> >      >       use MARC::File::XML(BinaryEncoding=>"utf8");
> >      >      diff --git
> >      a/misc/migration_tools/22_to_30/move_marc_to_biblioitems.pl
> >      >      b/misc/migration_tools/22_to_30/move_marc_to_biblioitems.pl
> >      >      index 3635dbc..051d93b 100755
> >      >      --- a/misc/migration_tools/22_to_30/
> move_marc_to_biblioitems.pl
> >      >      +++ b/misc/migration_tools/22_to_30/
> move_marc_to_biblioitems.pl
> >      >      @@ -8,6 +8,8 @@ BEGIN {
> >      >          # test carefully before changing this
> >      >          use FindBin;
> >      >          eval { require "$FindBin::Bin/../../kohalib.pl" };
> >      >      +    $XML::SAX::ParserPackage = "XML::LibXML::SAX::Parser"; #
> >      Make sure
> >      >      correct sax parser is used
> >      >      +
> >      >       }
> >      >       use C4::Context;
> >      >       use C4::Biblio;
> >      >      diff --git a/misc/migration_tools/bulkauthimport.pl
> >      >      b/misc/migration_tools/bulkauthimport.pl
> >      >      index 2709db9..5f859a1 100755
> >      >      --- a/misc/migration_tools/bulkauthimport.pl
> >      >      +++ b/misc/migration_tools/bulkauthimport.pl
> >      >      @@ -8,6 +8,8 @@ BEGIN {
> >      >          # test carefully before changing this
> >      >          use FindBin;
> >      >          eval { require "$FindBin::Bin/kohalib.pl" };
> >      >      +    $XML::SAX::ParserPackage = "XML::LibXML::SAX::Parser"; #
> >      Make sure
> >      >      correct sax parser is used
> >      >      +
> >      >       }
> >      >
> >      >       # Koha modules used
> >      >      diff --git a/misc/migration_tools/bulkmarcimport.pl
> >      >      b/misc/migration_tools/bulkmarcimport.pl
> >      >      index da0ebf7..5b1d823 100755
> >      >      --- a/misc/migration_tools/bulkmarcimport.pl
> >      >      +++ b/misc/migration_tools/bulkmarcimport.pl
> >      >      @@ -9,6 +9,8 @@ BEGIN {
> >      >          # test carefully before changing this
> >      >          use FindBin;
> >      >          eval { require "$FindBin::Bin/../kohalib.pl" };
> >      >      +    $XML::SAX::ParserPackage = "XML::LibXML::SAX::Parser"; #
> >      Make sure
> >      >      correct sax parser is used
> >      >      +
> >      >       }
> >      >
> >      >       # Koha modules used
> >      >      diff --git a/misc/sax_parser_print.pl b/misc/
> sax_parser_print.pl
> >      >      index d206a5e..ab5d1ff 100755
> >      >      --- a/misc/sax_parser_print.pl
> >      >      +++ b/misc/sax_parser_print.pl
> >      >      @@ -3,6 +3,7 @@
> >      >
> >      >       use strict;
> >      >       use warnings;
> >      >      +BEGIN { $XML::SAX::ParserPackage =
> "XML::LibXML::SAX::Parser"; }
> >      >       use XML::SAX::ParserFactory;
> >      >
> >      >       my $parser = XML::SAX::ParserFactory->parser();
> >      >      diff --git a/misc/sax_parser_test.pl b/misc/
> sax_parser_test.pl
> >      >      index b2e5974..359bb45 100755
> >      >      --- a/misc/sax_parser_test.pl
> >      >      +++ b/misc/sax_parser_test.pl
> >      >      @@ -1,5 +1,7 @@
> >      >       #!/usr/bin/perl
> >      >
> >      >      +BEGIN { $XML::SAX::ParserPackage = "XML::LibXML::SAX"; }
> >      >      +
> >      >       use strict;
> >      >       use warnings;
> >      >      --
> >      >      1.6.3.3
> >      >
> >      >      _______________________________________________
> >      >      Koha-patches mailing list
> >      >      Koha-patches at lists.koha.org
> >      >      http://lists.koha.org/mailman/listinfo/koha-patches
> >
> >      --
> >      Chris Cormack
> >      Catalyst IT Ltd.
> >      +64 4 803 2238
> >      PO Box 11-053, Manners St, Wellington 6142, New Zealand
>
> --
> Chris Cormack
> Catalyst IT Ltd.
> +64 4 803 2238
> PO Box 11-053, Manners St, Wellington 6142, New Zealand
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/koha-devel/attachments/20100428/fc9b3cec/attachment-0002.htm>


More information about the Koha-devel mailing list