[Koha-devel] Search Engine Changes : let's get some solr

Frédéric Demians frederic at tamil.fr
Mon Nov 15 06:56:44 CET 2010


Thanks a lot for those thorough tests. Your optimization of MARCXML
records parsing looks fantastic.

 > I've measured, and your parser is, in fact pretty fast -- *if* you
 > feed it only MARCXML that meets narrower constraints than are
 > permitted by the MARC21slim schema. However, I see no good reason to
 > limit Koha to that artificial restriction; having biblioitems.marcxml
 > contain MARCXML that validates against the MARC21slim is sufficient.

It's a design choice. MARCXML is the Koha internal serialization format
for MARC records. There is no obligation to conform to MARC21slim
schema. We even could choose another serialization format as it has
already been discussed. biblioitems.marcxml isn't open to the wide. It
is written by C4::ModBiblioMarc which uses MARC::Record::as_xml_record
function to populate marcxml DB field. So we already have an internal
restricted version of MARC21slim schema. And we could benefit of it if
pure Perl parsing is a real performance gain. That is for the good
reason.

 > Two parsers doing similar operations is an invitation for subtle bugs.
 > The pure Perl parser you propose currently doesn't handle namespaces
 > prefixes (which are allowed in MARC21slim records), wouldn't handle
 > any situation where the attributes aren't in the order you expect them
 > in (attribute order is not significant per the XML specification), and
 > will blithely accept non-well-formed XML without complaining (this is
 > *not* a good thing). It also doesn't recognize and correctly handle
 > XML entities. Obviously you could address much of this in your code,
 > but I suspect what you'll find is that you'll end up with an XML
 > parser that is slower and still has more bugs than any of the standard
 > parser modules.

See above. I don't see the need to handle any MARC21slim peculiarity in
the limited needs of Koha internal functions.

Regards,
--
Frédéric




More information about the Koha-devel mailing list