[Koha-devel] MARC import issues

Stephen Hedges shedges at athenscounty.lib.oh.us
Sun Aug 10 14:24:17 CEST 2003


I would like to update all the Koha developers on two issues that have come up 
as NPL has been migrating to Koha.  Both relate to the difference between the 
way Koha stores bibliographic information and the way MARC records (USMARC) 
structures the same information.

First a reminder of what those differences are.  Koha subdivides bibliographic 
information into three tables (basically):  biblio, biblioitems, and items.  
Biblio holds the basic information about the work -- title and author, 
copyright date, that sort of thing.  Biblioitems holds information about a 
particular manifestation of the work -- item type, classification, actual 
publication year (which may be different from the copyright year), etc.  And 
items stores information about individual copies of the particular 
manifestation of the work -- price, barcode number, date acquired, etc.  MARC, 
on the other hand, currently subdivides bibliographic data into two area:  
"Bibliographic," which holds the information that Koha would put in biblio and 
(some) biblioitems, and "Holdings," which has the information that Koha would 
put in items and (some) biblioitems.  It's the process of fitting two-part 
information into a three-part database that leads to complications.

1.  The process of importing MARC records results in one and only one row in 
biblioitems for each row in biblio.  That's because MARC makes an individual 
record for each manifestation of a work, so when you import a MARC record you 
are really importing each record into biblioitems, with a related row in biblio 
to hold the information that cannot be mapped to biblioitems (author, title, 
etc.).  That, of course, is exactly backwards from the way Koha is designed to 
work.  The import works OK, but when you do a search, you get lots of duplicate 
titles listed, because each printing or video or audio recording of a work has 
its own row in biblio.  That makes it very hard to decide which title listing 
you want to look at more closely.  Which "Gone with the wind" is the audio 
recording?
MARC handles this problem by providing tag 245h for the "Medium" of the work, 
surrounded by square brackets.  (The title itself is in 245a.)  Many MARC-based 
library systems display this tag when showing the results of a search, so you 
know that "Gone with the wind [audio recording]" is different from "Gone with 
the wind [videorecording(DVD)]" or "Gone with the wind."
There are two solutions I can think of for this problem, neither of them very 
satisfactory.  One is to add a column to biblio to hold "medium" for the 245h 
tag.  That, of course, violates the whole philosophy of what the biblio table 
is supposed to store.  The other is to actually modify the title that is stored 
in biblio.  That's the workaround solution we are using at NPL.  We 
periodically run a crude but efficient script that handles the job:

my $sth_getformat = $dbh->prepare("SELECT bibid,subfieldvalue FROM 
marc_subfield_table WHERE tag = '245' and subfieldcode = 'h'");
my $sth_gettitle = $dbh->prepare("SELECT title FROM biblio WHERE biblionumber = 
?");
my $sth_put = $dbh->prepare("UPDATE biblio SET title = ? WHERE biblionumber = 
?");

$sth_getformat->execute();

my $row;
while ($row = $sth_getformat->fetchrow_arrayref) {
    my $bibid = $row->[0];
    my $subfieldvalue = $row->[1];

    $sth_gettitle->execute($bibid);
    my $titleref = $sth_gettitle->fetchrow_arrayref;
    my $title = $titleref->[0];
    $sth_gettitle->finish;

    $subfieldvalue =~ /.+]/;
    my $newtitle = "$title $&";

    $sth_put->execute($newtitle,$bibid);
}

Could something similar be included as part of the MARC import process?  It's 
not elegant, but it does solve the problem.  Or better yet, can anyone think of 
a way to combine duplicate biblio rows into one biblio row?  (Seems like this 
would really screw up the relationships between tables.)

2.  While Koha stores the copyright date in biblio and the publication year in 
biblioitems, MARC puts both in one tag (260c), which of course can only be 
mapped to one Koha table.column.  So currently the library importing their MARC 
records into Koha has to decide which Koha table.column to fill, and then the 
Koha Biblio.pm strips out the first date found in the 260c tag and puts it 
there.  This is not good, because:  a) either the screens which display 
biblio.copyrightdate or the screens which display biblioitems.publicationyear 
are going to have nothing to display;  b) we're losing information in the 
import which could easily be retrieved;  and c) it leads to inaccurate 
information, because (in the US) if 260c has two dates, the first is always the 
publication year and the second is the copyright date, and the current Koha 
solution could end up putting the publication year in the copyright date 
column.
Again, we periodically run a (crude) script at NPL to load both table.columns:

my $sth_get = $dbh->prepare("SELECT bibid,subfieldvalue FROM 
marc_subfield_table WHERE tag = '260' and subfieldcode = 'c'");
my $sth_cprdate = $dbh->prepare("UPDATE biblio SET copyrightdate = ? WHERE 
biblionumber = ?");
my $sth_pubdate = $dbh->prepare("UPDATE biblioitems SET publicationyear = ? 
WHERE biblionumber = ?");

$sth_get->execute();
my $row;
while ($row = $sth_get->fetchrow_arrayref) {
    my $bibid = $row->[0];
    my $subfieldvalue = $row->[1];
    if (length $subfieldvalue > 8)  {   # if it is this long (even with extra
					# letters and punctuation), it must be
                                        # publication date, copyright date
	$subfieldvalue =~ /(\d{4}).+?(\d{4})/;
	$pubdate = $1;
	$cprdate = $2;
    } elsif ($subfieldvalue =~ /(\d{4})/) {  # only one date
	$pubdate = $1;
	$cprdate = $1;
    } else {                            # no dates
	$pubdate = '';
	$cprdate = '';
    }
    $sth_cprdate->execute($cprdate,$bibid);
    $sth_pubdate->execute($pubdate,$bibid);
}

Again, could this somehow be worked into the MARC import process? 

Stephen Hedges
Director, Nelsonville Public Library
95 W. Washington St., Nels.,OH 45764
(740) 753-2118    fax (740) 753-3543




More information about the Koha-devel mailing list