[Koha-devel] Batch cleanup to the catalog

Cab Vinton bibliwho at gmail.com
Wed Aug 1 19:07:07 CEST 2018


Many thanks, Michael.

Any changes would definitely be a cataloger-driven process.

Fwiw, we're looking mainly at extraneous OCLC-related fields in the
9XX's. But there are also 2 to 3 dozen additional fields that are
obsolete or otherwise not part of standard MARC 21 format.

I'd be surprised if many of these fields were indexed to be honest,
but I haven't taken a close look to confirm. (Catalog currently uses
244 fields, many of these only a handful of times.)

This is a medium term project, so I expect it will take the catalogers
a while to get through the planning phase, & also that the cleanup
itself will be done in stages. Ultimate goals are to get rid of
unnecessary fields, and to make sure they're not imported into the
catalog in the first place.

Thanks again,

Cab Vinton
Plaistow Public Library


On Wed, Aug 1, 2018 at 11:28 AM, Michael Hafen
<michael.hafen at washk12.org> wrote:
> Since you have the CataloguingLog turned off, you should see a net gain in
> the size of the database, both in Koha and in Zebra/Solr.  Though you
> probably won't notice the difference in Koha since the tables are InnoDB,
> which doesn't reduce file size in most cases.
> In the frontend you probably won't notice much difference I would expect.
> Since you indicate the fields are hidden in the OPAC the biggest difference
> will be in the work of your cataloguing staff.  I don't know if you have
> discusses the idea with them, sometimes cataloguers prefer to have more
> information available (even if unused), sometimes they prefer less.  More
> information could mean better record matching in searches and record merges.
> Less information could mean more efficient cataloguing since the staff
> doesn't need to keep as much information at hand while cataloguing.  I tend
> to prefer keeping the information, and hide it in the staff interface if it
> bothers someone, but that's my preference.
> As far as drawbacks to doing such changes in the backend there is one big
> drawback.  The metadata / marc fields need to be rebuilt and the search
> engine needs to be reindexed after any changes by hand to the database.
> That is the reason I tend to do any big batch modifications by setting up a
> script that uses the Koha modules ( the C4 / Koha api ).  That way Koha
> itself will take care of that for me.
> One final note, the difference in size from removing those fields is likely
> to be small, in the order of a few hundred megabytes at most would be my
> guess.
>
> On Wed, Aug 1, 2018 at 7:00 AM Cab Vinton <bibliwho at gmail.com> wrote:
>>
>> Hi, All --
>>
>> Koha's Batch Record Modification (BRM) tool makes it very easy to make
>> large-scale changes to the catalog. And we know that many catalogers
>> like to have their records just so :-)
>>
>> Is there a cost in overhead, however, to making such changes?
>>
>> For example, if a catalog of 150,000 records contains 75,000 unwanted
>> MARC fields, would you delete them, even if they're not displayed in
>> the OPAC or otherwise interfering w/ functionality?
>>
>> In particular, I'm wondering whether using the BRM tool involves the
>> creation of new data so that there's no real net gain w/ respect to
>> the goal of having a "cleaner" database. (In our case, we've turned
>> off the CataloguingLog system preference.)
>>
>> Along the same lines, are there any advantages to doing large-scale
>> batch process via the backend instead, i.e., as opposed to w/ the
>> built-in staff tools such as BRM? (I'm assuming there's no issue w/
>> using such tools to work w/ much smaller subsets of records.)
>>
>> Thanks in advance for any guidance.
>>
>> All best,
>>
>> Cab Vinton, Director
>> Plaistow Public Library
>> Plaistow, NH
>> _______________________________________________
>> Koha-devel mailing list
>> Koha-devel at lists.koha-community.org
>> http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
>> website : http://www.koha-community.org/
>> git : http://git.koha-community.org/
>> bugs : http://bugs.koha-community.org/
>
>
>
> --
> Michael Hafen
> Washington County School District Technology Department
> Systems Analyst
>


More information about the Koha-devel mailing list