[Koha-patches] [PATCH 3/3] Bug 6557: add script to update totalissues from stats

Jared Camins-Esakov jcamins at cpbibliography.com
Thu May 24 21:36:55 CEST 2012


NAME
       update_totalissues.pl

SYNOPSIS
         update_totalissues.pl --use-stats
         update_totalissues.pl --use-items
         update_totalissues.pl --commit=1000
         update_totalissues.pl --since='2012-01-01'
         update_totalissues.pl --interval=30d

DESCRIPTION
       This batch job populates bibliographic records' total issues count
       based on historical issue statistics.

       --help  Prints this help

       -v|--verbose
               Provide verbose log information (list every bib modified).

       --use-stats
               Use the data in the statistics table for populating total
               issues.

       --use-items
               Use items.issues data for populating total issues. Note that
               issues data from the items table does not respect the --since
               or --interval options, by definition. Also note that if both
               --use-stats and --use-items are specified, the count of biblios
               processed will be misleading.

       -s|--since=DATE
               Only process issues recorded in the statistics table since
               DATE.

       -i|--interval=S
               Only process issues recorded in the statistics table in the
               last N units of time. The interval should consist of a number
               with a one-letter unit suffix. The valid suffixes are h
               (hours), d (days), w (weeks), m (months), and y (years). The
               default unit is days.

       --incremental
               Add the number of issues found in the statistics table to the
               existing total issues count. Intended so that this script can
               be used as a cron job to update popularity information during
               low-usage periods.

       --commit=N
               Commit the results to the database after every N records are
               processed.

       --test  Only test the popularity population script.

=== TESTING PLAN ===

NOTE: in order to test this script, you will need to have some sort of
circulation data already existing in your Koha installation.

1) Disable UpdateTotalIssuesOnCirc
2) Run: misc/cronjobs/update_total_issues.pl --use-items -t -v
3) If you have total checkout data in your item records (i.e. anything
   in 952$l), you should see messages like "Processing bib 43 (1 issues)"
4) Choose one of the lines that shows more than 0 issues, and view the
   record with that biblionumber in the staff client, choosing the "Items"
   tab (moredetail.pl). Add up the "Total checkouts" listed for each item,
   and confirm it matches what the script reported
5) Run: misc/cronjobs/update_total_issues.pl --use-stats -t -v
6) If you have any circulation statistics in your database (i.e. any
   'issue' entries in your statistics table), you should see messages
   like "Processing bib 43 (1 issues)";
7) Choose one of the lines and view the record with that biblionumber in
   the staff client, choosing the "Items" tab (moredetail.pl). If you
   count the number of checkouts listed in each item's checkout history,
   the total should match what the script reported.
8) Check out an item
9) Run: misc/cronjobs/update_total_issues.pl --use-stats --since=1h -t -v
10) You should see one line reporting a single circ for the bib record
    associated with the item you just checked out (there may be more if
    you checked out any books in the hour prior to running these tests
11) If the results in steps 4, 7, and 10 match the predictions, the
    script worked
---
 misc/cronjobs/crontab.example       |    3 +
 misc/cronjobs/update_totalissues.pl |  272 +++++++++++++++++++++++++++++++++++
 2 files changed, 275 insertions(+), 0 deletions(-)
 create mode 100755 misc/cronjobs/update_totalissues.pl

diff --git a/misc/cronjobs/crontab.example b/misc/cronjobs/crontab.example
index 3c5030d..ed8f117 100644
--- a/misc/cronjobs/crontab.example
+++ b/misc/cronjobs/crontab.example
@@ -67,6 +67,9 @@ KOHA_CRON_PATH = /usr/share/koha/bin/cronjobs
 # Cancel expired holds
 0 1 * * *  $KOHA_CRON_PATH/holds/cancel_expired_holds.pl >/dev/null 2>&1
 
+# Update popularity counts for biblio records
+0 2 * * *  $KOHA_CRON_PATH/update_totalissues.pl --commit=1000 --use-stats --incremental --interval=1d >/dev/null 2>&1
+
 # ZEBRA INDEX UPDATES with -z option, incremental index updates throughout the day
 # for both authorities and bibs
 */10 * * * *  $KOHA_CRON_PATH/../migration_tools/rebuild_zebra.pl -b -a -z >/dev/null
diff --git a/misc/cronjobs/update_totalissues.pl b/misc/cronjobs/update_totalissues.pl
new file mode 100755
index 0000000..e621b9d
--- /dev/null
+++ b/misc/cronjobs/update_totalissues.pl
@@ -0,0 +1,272 @@
+#!/usr/bin/perl
+
+# Copyright 2012 C & P Bibliography Services
+#
+# This file is part of Koha.
+#
+# Koha is free software; you can redistribute it and/or modify it under the
+# terms of the GNU General Public License as published by the Free Software
+# Foundation; either version 2 of the License, or (at your option) any later
+# version.
+#
+# Koha is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
+# A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with Koha; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+use strict;
+use warnings;
+
+BEGIN {
+
+    # find Koha's Perl modules
+    # test carefully before changing this
+    use FindBin;
+    eval { require "$FindBin::Bin/../kohalib.pl" };
+}
+
+use Getopt::Long;
+use Pod::Usage;
+use C4::Context;
+use C4::Biblio;
+use DateTime;
+use DateTime::Format::MySQL;
+use Time::HiRes qw/time/;
+use POSIX qw/strftime ceil/;
+
+sub usage {
+    pod2usage( -verbose => 2 );
+    exit;
+}
+
+$| = 1;
+
+# command-line parameters
+my $verbose   = 0;
+my $test_only = 0;
+my $want_help = 0;
+my $since;
+my $interval;
+my $usestats    = 0;
+my $useitems    = 0;
+my $incremental = 0;
+my $commit      = 100;
+my $unit;
+
+my $result = GetOptions(
+    'v|verbose'    => \$verbose,
+    't|test'       => \$test_only,
+    's|since=s'    => \$since,
+    'i|interval=s' => \$interval,
+    'use-stats'    => \$usestats,
+    'use-items'    => \$useitems,
+    'incremental'  => \$incremental,
+    'c|commit=i'   => \$commit,
+    'h|help'       => \$want_help
+);
+
+binmode( STDOUT, ":utf8" );
+
+if ( defined $since && defined $interval ) {
+    print "The --since and --interval options are mutually exclusive.\n\n";
+    $want_help = 1;
+}
+
+if ( $usestats && $incremental ) {
+    print
+      "The --use-items and --incremental options are mutually exclusive.\n\n";
+    $want_help = 1;
+}
+
+unless ( $usestats || $useitems ) {
+    print "You must specify either --use-stats and/or --use-items.\n\n";
+    $want_help = 1;
+}
+
+if ( not $result or $want_help ) {
+    usage();
+}
+
+my $dbh = C4::Context->dbh;
+$dbh->{AutoCommit} = 0;
+
+my $num_bibs_processed = 0;
+
+my $starttime = time();
+
+process_items() if $useitems;
+process_stats() if $usestats;
+
+report();
+
+exit 0;
+
+sub process_items {
+    my $query =
+"SELECT items.biblionumber, SUM(items.issues) FROM items GROUP BY items.biblionumber;";
+    process_query($query);
+}
+
+sub process_stats {
+    if ($interval) {
+        my $dt = DateTime->now;
+
+        my %units = (
+            h => 'hours',
+            d => 'days',
+            w => 'weeks',
+            m => 'months',
+            y => 'years'
+        );
+
+        $interval =~ m/([0-9]*)([hdwmy]?)$/;
+        $unit = $2 || 'd';
+        $since = DateTime::Format::MySQL->format_datetime(
+            $dt->subtract( $units{$unit} => $1 ) );
+    }
+    my $limit = '';
+    $limit = " AND statistics.datetime >= ?" if ( $interval || $since );
+
+    my $query =
+"SELECT biblio.biblionumber, COUNT(statistics.itemnumber) FROM biblio LEFT JOIN items ON (biblio.biblionumber=items.itemnumber) LEFT JOIN statistics ON (items.itemnumber=statistics.itemnumber) WHERE statistics.type = 'issue' $limit GROUP BY biblio.biblionumber;";
+    process_query($query);
+
+    unless ($incremental) {
+        $query =
+"SELECT biblio.biblionumber, 0 FROM biblio LEFT JOIN items ON (biblio.biblionumber=items.itemnumber) LEFT JOIN statistics ON (items.itemnumber=statistics.itemnumber) WHERE statistics.itemnumber IS NULL GROUP BY biblio.biblionumber;";
+        process_query($query);
+
+        $query =
+"SELECT biblio.biblionumber, 0 FROM biblio LEFT JOIN items ON (biblio.biblionumber=items.itemnumber) WHERE items.itemnumber IS NULL GROUP BY biblio.biblionumber;";
+        process_query($query);
+    }
+
+    $dbh->commit();
+}
+
+sub process_query {
+    my $query = shift;
+    my $sth   = $dbh->prepare($query);
+
+    if ($since) {
+        $sth->execute($since);
+    }
+    else {
+        $sth->execute();
+    }
+
+    while ( my ( $biblionumber, $totalissues ) = $sth->fetchrow_array() ) {
+        $num_bibs_processed++;
+        print "Processing bib $biblionumber ($totalissues issues)\n"
+          if $verbose;
+        if ( not $test_only ) {
+            if ( $incremental && $totalissues > 0 ) {
+                UpdateTotalIssues( $biblionumber, $totalissues );
+            }
+            else {
+                UpdateTotalIssues( $biblionumber, 0, $totalissues );
+            }
+        }
+        if ( not $test_only and ( $num_bibs_processed % $commit ) == 0 ) {
+            print_progress_and_commit($num_bibs_processed);
+        }
+    }
+
+    $dbh->commit();
+}
+
+sub report {
+    my $endtime = time();
+    my $totaltime = ceil( ( $endtime - $starttime ) * 1000 );
+    $starttime = strftime( '%D %T', localtime($starttime) );
+    $endtime   = strftime( '%D %T', localtime($endtime) );
+
+    my $summary = <<_SUMMARY_;
+
+Update total issues count script report
+=======================================================
+Run started at:                         $starttime
+Run ended at:                           $endtime
+Total run time:                         $totaltime ms
+Number of bibs modified:                $num_bibs_processed
+_SUMMARY_
+    $summary .= "\n****  Ran in test mode only  ****\n" if $test_only;
+    print $summary;
+}
+
+sub print_progress_and_commit {
+    my $recs = shift;
+    $dbh->commit();
+    print "... processed $recs records\n";
+}
+
+=head1 NAME
+
+update_totalissues.pl
+
+=head1 SYNOPSIS
+
+  update_totalissues.pl --use-stats
+  update_totalissues.pl --use-items
+  update_totalissues.pl --commit=1000
+  update_totalissues.pl --since='2012-01-01'
+  update_totalissues.pl --interval=30d
+
+=head1 DESCRIPTION
+
+This batch job populates bibliographic records' total issues count based
+on historical issue statistics.
+
+=over 8
+
+=item B<--help>
+
+Prints this help
+
+=item B<-v|--verbose>
+
+Provide verbose log information (list every bib modified).
+
+=item B<--use-stats>
+
+Use the data in the statistics table for populating total issues.
+
+=item B<--use-items>
+
+Use items.issues data for populating total issues. Note that issues
+data from the items table does not respect the --since or --interval
+options, by definition. Also note that if both --use-stats and
+--use-items are specified, the count of biblios processed will be
+misleading.
+
+=item B<-s|--since=DATE>
+
+Only process issues recorded in the statistics table since DATE.
+
+=item B<-i|--interval=S>
+
+Only process issues recorded in the statistics table in the last N
+units of time. The interval should consist of a number with a one-letter
+unit suffix. The valid suffixes are h (hours), d (days), w (weeks),
+m (months), and y (years). The default unit is days.
+
+=item B<--incremental>
+
+Add the number of issues found in the statistics table to the existing
+total issues count. Intended so that this script can be used as a cron
+job to update popularity information during low-usage periods.
+
+=item B<--commit=N>
+
+Commit the results to the database after every N records are processed.
+
+=item B<--test>
+
+Only test the popularity population script.
+
+=back
+
+=cut
-- 
1.7.2.5



More information about the Koha-patches mailing list