[Koha-devel] MARC record size limit

Tue Oct 12 20:45:09 CEST 2010

Reply inline:

On Tue, October 12, 2010 16:20, LAURENT Henri-Damien wrote:
> Le 12/10/2010 14:48, Thomas Dukleth a écrit :
>> Reply inline:
>>
>>
>> Original Subject:  [Koha-devel] Search Engine Changes : let's get some
>> solr
>>
>> On Mon, October 4, 2010 08:10, LAURENT Henri-Damien wrote:

[...]

>>> I think that every one agrees that we have to refactor C4::Search.
>>> Indeed, query parser is not able to manage independantly all the
>>> configuration options. And usage of usmarc as internal for biblio comes
>>> with a serious limitation of 9999 bytes, which for big biblios with
>>> many
>>> items, is not enough.
>>
>> How do MARC limitations on record size relate to Solr/Indexing or Zebra
>> indexing which lacks Solr/Lucene support in the current version?
> Koha is now using iso2709 returned from zebra in order to display result
> lists.

I recall that having Zebra return ISO2709, MARC communications format,
records had the supposed advantage of faster response time from Zebra.

> Problem is that if zebra is returning only part of the biblio and/or
> MARC::Record is not able to parse the whole data then the biblio is not
> displayed. We have biblio records which contains more than 1000 items.
> And MARC::Record/MARC::File::XML fails to parse that.
>
> So this is a real issue.

Ultimately, we need a specific solution to various problems arising from
storing holdings directly in the MARC bibliographic records.

>
>
>>
>> How does BibLibre intend to fix the limitation on the size of
>> bibliographic records as part of its work on record indexing and
>> retrieval
>> in Koha or in some parallel work.?
> Solr/Lucene can return indexes and thoses be used in order to display
> desired data or we could also do the same as we do with zebra :
> 	- store the data record (Format could be iso2709 or marcxml or YAML)
> 	- use that for display.

If using ISO 2709, MARC communications format, how would the problem of
excess record size be addressed?

> Or we could use GetBiblio in order to get the data from database.
> Problem now would be the fact that storing xml in database is not really
> optimal for process.

I like the idea of using YAML for some purposes.

As you state, previous testing showed that returning every record in a
large result set from the SQL database was very inefficient as compared to
using the records as part of the response from the index server.

Is there any practical way of sufficiently improving the efficiency of
accessing a large set of records from the SQL database?  How much might
retrieving and parsing YAML records from the database help?

I can imagine using XSLT to pre-process MARCXML records into an
appropriate format, such YAML with embedded HTML, pure HTML, or whatever
is needed embedded for a particular purpose and storing the pre-processed
records in appropriate special purpose columns.  Real time parsing would
be minimised.  The OPAC result set display might use
biblioitems.recordOPACDisplayBrief.  The standard single record view might
use biblioitems.recordOPACDisplayDetail.  An ISBD card view might use
biblioitems.recordOPACDisplayISBD.

[...]

Thomas Dukleth
Agogme
109 E 9th Street, 3D
New York, NY  10003
USA
http://www.agogme.com
+1 212-674-3783