[Koha-devel] What is the largest library collection you've heard of? / Also different metadata formats / RDF (aka all the things)

Mon Nov 9 07:25:07 CET 2015

Hi all:

I was just wondering. what's the biggest library collection you've heard of?
What's the biggest Koha collection you've heard of?

Most recently, I recall there being a mention of 13-14 million bibliographic
records in a Turkish public library consortium:
http://koha.1045719.n5.nabble.com/KOHA-with-PostgreSQL-td5856359.html

I did some Googling and arrived at this list:
https://en.wikipedia.org/wiki/List_of_largest_libraries. The two biggest are
the British Library with 170+ million items and the Library of Congress with
160+ millions. Library and Archives Canada come in third with 54 million,
and the New York Public Library comes in fourth with 53 million.

It drops off pretty fast after that. 2 in the 40s, 4 in the 30s, 6 in the
20s, and 3 in the teens. 

I think the largest Koha I've managed had a little over 1 million items and
1 million biblios.

I'm guessing that many cases of "large" libraries must be in the 1-10
million range, which doesn't actually seem that bad. 

I sometimes wonder about the merit of a table that stored something like
("id","type","metadata"). The primary key has an index by default I believe
and then maybe add one to "type" if we find necessary although it might only
ever need to be accessed after the row is already retrieved. 

Just thinking about adding different metadata formats to Koha. In theory,
Zebra can handle any XML metadata format we throw at it. I think you can
index different record types into Zebra. We'd need to change how the
indexing runs and add some XSLTs for indexing/retrieving those metadata
formats, but I think it's doable.

There's a few ways of handling the results afterward. you could add
templates to the existing XSLTs so you could feed the metadata to the same
XSLT regardless of format. Or we could adopt an intermediary data format
(when retrieving data from Zebra, we can define our own XSLTs per record
type I believe) and do our displays based on that intermediary format.

The remaining troubles would then be with other places in Koha that use the
MARCXML directly. such as cataloguing, which relies on mappings between the
relational database and MARC and items which are composed/decomposed to/from
MARCXML. 

But I think that's all achievable.

Of course, I don't have a project at the moment that would involve adding
metadata formats. Thinking more about RDF, but I think that's a bit of a
barrel of monkeys. While I think RDF has merit when it comes to browsing
records, I still don't see how you could effectively retrieve an RDF record
from a local triplestore if you're relying on data stored on a remote
server. Your RDF record might have the title you want to find, but what if
you want to find a record by author? There's no local data referring to the
author. You just have a triple in the record that contains an IRI pointing
to the author record on another server. 

I don't have a lot of experience with RDF or triplestore or linked data in
general. but I assume that there must be some sort of local caching of data
in search indexes? 

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St, Ultimo, NSW 2007

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20151109/27fbc98e/attachment.html>