[Koha-devel] RE: [oss4lib-discuss] MARC -> RDBMS (fwd)

Wed Dec 11 13:16:06 CET 2002

The section on Koha is particularily interesting to us going forward I
think.

-pate

---------- Forwarded message ----------
Date: Wed, 11 Dec 2002 16:02:26 -0400
From: Brian Cassidy <brian at nald.ca>
To: oss4lib-discuss at lists.sourceforge.net
Subject: RE: [oss4lib-discuss] MARC -> RDBMS

Hello again,

Thanks to everyone for their quick replies! Here's what I've done since
yesterday:

I looked into the 'Koha' library software. It looks interesting, but the
line "It is currently possible to import MARC records, one at a time."
turned me off of it, due to the fact that I have several repositories
with thousands of records each.

I looked into Michael Doran's concerns, but the powers that be want me
to steer away from a Z39.50 interface. So, that's as far as I got with
that. :/

After looking at the document
(http://www.openisis.org/openisis/doc/RdbConv) provided by Ferran Jorba,
I created a simple DB structure (pardon my ASCII :)

.---------.        .-----------.
| item    |        | marc_data |
-----------  ----  -------------
| item_id |        | item_id   |
-----------        | data      |
     |             -------------
     |
    /|\
.-------------.
| tag_data    |
---------------
| item_id     |
| tag_order   |
| tag         |
| indicator_1 |
| indicator_2 |
| subfield    |
| data        |
---------------

Note: marc_data.data is just a 'blob' of the MARC record.

I then used MARC::Record (Hi Ed.) to quickly write a couple perl scripts
to generate some SQL and run the queries. My database now has 2002
(MARC) records in a somewhat useable structure. For instance, I can now
say

SELECT
	item_id
FROM
	tag_data
WHERE
	tag = '008' AND
	data LIKE '%fre__'

to get all of the IDs of the French records.

I'm now at the point where I need to create a data crosswalk (as Art
mentioned) so I can display the data in a human readable way, as well as
facilitate easy searching. The Dublin Core crosswalk
(http://www.loc.gov/marc/marc2dc.html) on the loc.gov website (hat tip:
Art) seems like a great standard to follow. The problem that I'm now
facing is that frequently, data in a tag is split up into many
subfields. For instance:

SELECT
	item_id,
	subfield,
	data
FROM
	ust_tag_data
WHERE
	tag = '245' -- Title
ORDER BY
	item_id

One of my results is:

item_id | subfield | data
-------------------------------------------------------------
...
2       | a        | Keeping Alberta competitive :
2       | b        | success through workplace learning. --
...

Now, it's totally doable to loop through the results (in perl, ASP, etc)
and concatenate the data together. This COULD, however, get tedious
depending on the number of fields I need to look at. This is probably
the wrong forum to ask, but would anyone have any suggestions on how to
proceed (or perhaps takes some steps back and attack it differently)?

Thanks again for all your help!

-Brian Cassidy (brian at nald dot ca)

http://www.gordano.com - Messaging for educators.

-------------------------------------------------------
This sf.net email is sponsored by:
With Great Power, Comes Great Responsibility
Learn to use your power at OSDN's High Performance Computing Channel
http://hpc.devchannel.org/
_______________________________________________

oss4lib-discuss at lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/oss4lib-discuss
see also http://www.oss4lib.org/