No subject

Wed May 26 14:59:14 CEST 2010

I would have to agree that switching to a marc database is going to raise
the bar for new developers to work on the Koha code.   Many new developers
will come to Koha with no knowledge of MARC, and MARC can be daunting.

I'm hopeful that we can develop a cataloguing API that will alleviate some
of these problems.  It should be possible for us to develop some routines
that "hide" the marc format of the data from the developers to a certain
extent.   As an overly simplistic example, we could have a routine like:

getinfo(1563,'author')  returns Herman Melville  from the 100 a subfield
getinfo(1563,'title')   returns Moby Dick        from the 245 a subfield

setinfo(1563,'author', 'Melville, Herman')  updates author info 

Here the number 1563 would be the bibid of the item in question.

Steve Tonnesen

Here are some direct responses to your comments.  The numbered items
represent your comments (paraphrased by me in some cases, hopefully I
didn't mess them up) followed by my comments: 

1.  acqui.simple and marcimport.pl == MARC compatible

  acqui.simple and marcimport.pl will import marc records and "squash" the
  data into the standard Koha database.  There can be considerable loss of
  information in this process.  In my mind, MARC compatible means that we
  can import, export, view, and edit bibliographic records in MARC format.
  Note that it doesn't mean that we _have_ to view and edit in MARC
  format, just that we _can_.

2.  Use Marc tables only for data that is not already represented by the
    koha databases.

  I think this is a bad idea.  It'd be too confusing about what
  information was stored where.

3.  Storing Koha-specific data in local Marc tags, just for the sake of
    saying it is all Marc, seems to me to be arbitrary and inefficient.

  My reason for storing koha-specific tags in the marc data is just so
  that existing libraries can switch to the new database without losing
  the functionality that they expect Koha to have.  Primarily this means
  tying individual marc records together based on their "copyright"
  information as it is stored in the current biblio table.  Currently,
  Romeo and Juliet by Shakespeare could be in the library in seven
  different formats (different publications, video, audio, etc.) each of
  which has a separate biblioitems entry, but all of which share one
  biblio entry.  I proposed added a local marc field that would tie the
  separate MARC entries for all 7 formats of this work together as well. 

4.  You give some examples of what you think a search query would look
    like with the new MARC data and state, "As a programmer, I cannot see
    this being practical for my skill set to work on.  I really just want
    to be able to select columns from tables in a reasonable manner." 

  First off, searches will never be done directly on the MARC data storage
  tables.  That would be too slow.  I envision separate indexes being
  generated to facilitate searching.  It would be easy, for example, to
  create an index that contained the following information:

      bibid, title, author, dewey, isbn

  Then a search for bibid would be identical to the search that you gave
  in your example using the koha database.  Not that you could potentially
  lose information stored in the marc record if there is more than one
  author, for example.  These "seach indexes" are still vaporware.  Paul
  and I are trying to work out the specifics of what the new backend
  database will look like.

5.  You mention MARC-XML and suggest that we not tie ourselves to the MARC
    format.

  We are not proposing that bibliographic data be stored in MARC format,
  only that the back end database be capable of storing any and all
  information that a MARC record can store (whether it is a conventional
  MARC record or the as-yet-unspecified MARC-XML format record).  The
  existing MARC format is just a flat file that uses control character
  separators and a directory index for the tags.  The MARC-XML
  specification will be the same format, but will use XML tags for
  separating the data instead of control characters.  We would not want to
  use either of these formats internally, although we would probably want
  to be capable of importing/exporting those formats.

> I don't want to sound harsh, but frankly, if this is the goal, I'd 
> like to know now so I can wish you good luck on it and look
> elsewhere for something that fits me better.  Thanks for taking
> the time to discuss it.

Switching to MARC is a big undertaking and it should be well thought out.
I appreciate your comments.

Steve.