[Koha-devel] Circulation history anonymisation

Thomas Dukleth kohadevel at agogme.com
Mon Dec 27 16:38:29 CET 2010


Reply inline:


Original subject: Re: [Koha-devel] OPAC Enhancement in Koha 3.4

1.  FEATURES BASED ON INDIVIDUAL CIRCULATION HISTORY.

On Sun, December 26, 2010 18:03, David Schuster wrote:
> These wold be great developments -

The proposed features for using correlations of circulation history would
produce some useful and interesting correlations which could have the
advantage of using large passively acquired data sets.  However, passive
circulation or purchase history, is a weak basis to inform patron title
choice relative to a proper recommendation system using active
recommendations and various meta-data sets.

Despite the weakness of correlations from circulation history for the
purpose of recommendations, if someone is interested in working on
developing such features for Koha, then Koha should have such features
provided they are properly labelled in some form.  "Patrons who borrowed
this also borrowed those", would be an appropriate label for mere
correlations from circulation history.  "Patrons recommend those", would
be an inappropriate label for mere correlations from circulation history.


2.  PRIVACY USING CIRCULATION HISTORY.

2.1.  CIRCULATION HISTORY ANONYMISATION.

> but then some of us have issues with
> patron privacy...  We have parents and teachers asking for a "list of what
> student X has checked out" we like to say this is not doable in Koha.

Koha will need to constantly improve privacy to stay ahead of Big Brother
and comply with privacy laws.  Just say no to data retention requests or
data retention laws which may be passed and consult appropriate legal
advocates if facing data retention mandates from government.

The Koha administration interface has a circulation data anonymisation
feature in Home > Tools > Patrons (anonymize, bulk-delete).  The tool
calls C4::Circulation::AnonymiseIssueHistory from tools/cleanborrowers.pl,
http://git.koha-community.org/gitweb/?p=koha.git;a=blob;f=tools/cleanborrowers.pl
.

In 2009, Paul Poulain had submitted a patch to allow patrons to protect
the privacy of their own circulation history by setting the
AnonymousPatron system preference and calling
C4::Circulation::AnonymiseIssueHistory from a new script
opac/opac-privacy.pl,
http://lists.koha-community.org/pipermail/koha-patches/2009-May/003486.html
.

Both tools/cleanborrowers.pl and opac/opac-privacy.pl require some user to
actively actuate a script for circulation history anonymisation.  A
maintenance script run from cron is needed which would use a system
preference for a period to keep circulation history which can be
overridden by patrons as at least ether 'no preservation period' or
'preservation forever'.

Backup files containing circulation history which has not been anonymised
also need to be periodically erased and overwritten with multiple passes
including a 'random' overwrite.

You should presume that Big Brother might seize a dump of the database at
some time in future against your protestations and perhaps without your
knowledge.  Effort should be taken to minimise the harm in case of such a
possible seizure.


2.2.  CIRCULATION HISTORY ANONYMISATION WITH CORRELATIONS TO TITLES.

>  So
> if
> there was a way to hide the data but use it analytically that would be
> fantastic.

The suggested circulation history correlation features would need
something of a feature for preserving circulation history with
correlations between anonymised patrons and titles borrowed instead of
only anonymous circulation history with no correlations.  Complete
pseudonymous circulation history would be the equivalent to having no
circulation history anonymisation for anyone with access to a full dump of
the database and the source code.

Only creating a new anonymous patron ID for holding circulation history as
items are borrowed and/or each time patron circulation history is
anonymised, would preserve any correlations between anonymised patrons and
titles borrowed.  Preserving correlations for works, expressions, or
manifestations would preserve anonymity better than preserving
correlations to individual item barcodes.  Beware that even such
anonymised correlations could be merely a thin veil of anonymity if the
data set would be relatively small or the period between anonymisation
actions relatively large.

[...]


Thomas Dukleth
Agogme
109 E 9th Street, 3D
New York, NY  10003
USA
http://www.agogme.com
+1 212-674-3783




More information about the Koha-devel mailing list