[Koha-devel] MARC record size limit

Fouts, Clay cfouts at liblime.com
Wed Oct 27 00:01:12 CEST 2010


I did some (very limited) testing on storing and retrieving MARC in YAML.
The results were not encouraging. IIRC, I just did a direct conversion of
the MARC::Record object into YAML and back. Perhaps there's a way to
optimize the formatting that would improve performance, but my testing
showed sometimes even worse performance than XML.

MARCXML is a performance killer at this point, but there's no other apparent
way to handle large bib records. The parsing is the issue, not the data
transfer load. Perhaps cached BSON-formatted MARC::Record objects are a way
out of this.

Clay


On Tue, Oct 12, 2010 at 11:45 AM, Thomas Dukleth <kohadevel at agogme.com>wrote:

> Reply inline:
>
>
> On Tue, October 12, 2010 16:20, LAURENT Henri-Damien wrote:
> > Le 12/10/2010 14:48, Thomas Dukleth a écrit :
> >> Reply inline:
> >>
> >>
> >> Original Subject:  [Koha-devel] Search Engine Changes : let's get some
> >> solr
> >>
> >> On Mon, October 4, 2010 08:10, LAURENT Henri-Damien wrote:
>
> [...]
>
> >>> I think that every one agrees that we have to refactor C4::Search.
> >>> Indeed, query parser is not able to manage independantly all the
> >>> configuration options. And usage of usmarc as internal for biblio comes
> >>> with a serious limitation of 9999 bytes, which for big biblios with
> >>> many
> >>> items, is not enough.
> >>
> >> How do MARC limitations on record size relate to Solr/Indexing or Zebra
> >> indexing which lacks Solr/Lucene support in the current version?
> > Koha is now using iso2709 returned from zebra in order to display result
> > lists.
>
> I recall that having Zebra return ISO2709, MARC communications format,
> records had the supposed advantage of faster response time from Zebra.
>
> > Problem is that if zebra is returning only part of the biblio and/or
> > MARC::Record is not able to parse the whole data then the biblio is not
> > displayed. We have biblio records which contains more than 1000 items.
> > And MARC::Record/MARC::File::XML fails to parse that.
> >
> > So this is a real issue.
>
> Ultimately, we need a specific solution to various problems arising from
> storing holdings directly in the MARC bibliographic records.
>
> >
> >
> >>
> >> How does BibLibre intend to fix the limitation on the size of
> >> bibliographic records as part of its work on record indexing and
> >> retrieval
> >> in Koha or in some parallel work.?
> > Solr/Lucene can return indexes and thoses be used in order to display
> > desired data or we could also do the same as we do with zebra :
> >       - store the data record (Format could be iso2709 or marcxml or
> YAML)
> >       - use that for display.
>
> If using ISO 2709, MARC communications format, how would the problem of
> excess record size be addressed?
>
> > Or we could use GetBiblio in order to get the data from database.
> > Problem now would be the fact that storing xml in database is not really
> > optimal for process.
>
> I like the idea of using YAML for some purposes.
>
> As you state, previous testing showed that returning every record in a
> large result set from the SQL database was very inefficient as compared to
> using the records as part of the response from the index server.
>
> Is there any practical way of sufficiently improving the efficiency of
> accessing a large set of records from the SQL database?  How much might
> retrieving and parsing YAML records from the database help?
>
> I can imagine using XSLT to pre-process MARCXML records into an
> appropriate format, such YAML with embedded HTML, pure HTML, or whatever
> is needed embedded for a particular purpose and storing the pre-processed
> records in appropriate special purpose columns.  Real time parsing would
> be minimised.  The OPAC result set display might use
> biblioitems.recordOPACDisplayBrief.  The standard single record view might
> use biblioitems.recordOPACDisplayDetail.  An ISBD card view might use
> biblioitems.recordOPACDisplayISBD.
>
> [...]
>
>
> Thomas Dukleth
> Agogme
> 109 E 9th Street, 3D
> New York, NY  10003
> USA
> http://www.agogme.com
> +1 212-674-3783
>
>
> _______________________________________________
> Koha-devel mailing list
> Koha-devel at lists.koha-community.org
> http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/koha-devel/attachments/20101026/9ac1b13a/attachment.htm>


More information about the Koha-devel mailing list