[Koha-bugs] [Bug 5377] Database fields too small for multiple ISBN and ISSN

Thu Jul 4 02:25:48 CEST 2013

http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=5377

--- Comment #8 from David Cook <dcook at prosentient.com.au> ---
(In reply to Galen Charlton from comment #7)
> (In reply to David Cook from comment #3)
> > That said, a new table might be overkill, because ISBN might be the only
> > field in biblioitems that contains multiple values. I'm not 100% sure on
> > this one, as I just took a quick look glance through some sample data.
> 
> Well, there are a *lot* of fields that are repeatable and where you might
> well want to display and search on all repeats for a record.  Off the top of
> my head:
> 
> - EAN/UPCs, as mentioned by Mathieu
> - arbitrary record identifiers (e.g., from the MARC21 035 field)
> - chapter and part titles
> - languages of the work

Good point, Galen. When I first wrote my comment, I wasn't aware that there
were other cases of repeatable MARC fields being put in a single database field
with a pipe separator. However, after reading the other comments and taking a
more thorough look, I see that more and more. 

Perhaps a new table, following the model you suggested in 2010, would be the
best idea.

However, if we were to have a "biblio_attributes" table, might it make sense to
drop the biblioitems table and reduce the scope of the biblio table?

That is to say, all bibliographic attributes (not only 'isbn' and 'issn' but
also 'title', 'author', 'notes', and other attributes from both the biblio and
biblioitems tables) would be stored in "biblio_attributes", while the "biblio"
table would still exist with system specific attributes/data such as the
'biblionumber', 'frameworkcode', 'timestamp', and 'datecreated' fields. 

Conceptually, I like that separation of data. In terms of killing MARC, it
might be an idea to make yet another table called "biblio_records" where the
record is stored in a field (i.e. blob) and a metadata descriptor is recorded
in another field.

However, it would take a massive re-write to the Koha code to make that a
reality, no? 

Also, by putting all that data in to one table, might we suffer some large
performance hits for databases with huge amounts of bib records? 

--

These are just musings on my part. I haven't designed enough databases to know
what the best practice is in terms of performance. 

I wonder sometimes about how much we gain from using the relational database to
store bibliographic data. Currently, are we not storing bibliographic data in
4+ places? The biblio table, the biblioitems table, the MARC blob in the
biblioitems table, the MARCXML blob in the biblioitems table, and finally the
Zebra database? 

Both in terms of killing MARC and improving data integrity (and perhaps
performance), might it not make sense to reduce the number of places we're
storing data? 

Is the idea of storing data in the relational database to improve the speed of
retrieval? Do we actually gain speed? No pattern has emerged to me in terms of
deciding when to use the MARC record and when to use the database as the source
of data in a given script. I know we have the "mod" scripts which ensure data
integrity, but I can't help but think that there must be a better way of
storing and retrieving bibliographic data...

Sorry for the long comment! I would almost tl;dr it, but I think they're valid
questions. 

"biblio", "biblio_attributes", and "biblio_records"?
"biblio", "biblioitems", "biblio_attributes"?
"biblio","biblio_records"?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are watching all bug changes.