No subject

Wed Jan 4 10:30:06 CET 2012

I will open separate bug)

2) smarter memoize with per-request and persistent cache

Existing memoize does not have correct cache invalidation under plack
(--max-requests is gross hack and I would recommend running only
anonymous OPAC with that ;-) so I will write alternative memoize which
will call unmemoize on each request. This mimics CGI, provides
performance improvement and should be safe to run.

Existing memoize_memcached functions will become memoize_persistent
which will be able to cache in memcache, shared memory (why run memcache
when you don't need to for small instances), Redis or DB_File (Ian
mentioned files). We could use Memoize::Expire for that purpose to keep
in-memory cache under control.

My goal it to have pluggable caching back-ends without changing Koha
code (or via system preference!)

Q: for this I will try to submit bug perl memoize function, together
with cache invalidation hooks needed to make it work under plack. OK?

I would really love to move to Redis (with which I have very good
experience since I'm original author of perl bindings ;-) mostly because
it has per-key cache invalidation which memcache lacks (and this
invalidates whole cache at once). Redis ability to lookup cache keys
using globs (e.g. *item*12342*) might help a *lot* in cache
invalidation.

memoize_persistant definitions should be accompanied with calls to
invalidation. I do agree that inserting new value into cache is better
solution so I will have that in mind.

3) caching of full database tables

This point is subtly different than memoize: sometimes we benefit from
caching whole data structure from database (frameworks, itemtypes,
languages, systempreferences) but in a way that allows us to retrieve single
items.

Another example are functions which would be memoizable but use some
kind of parameter ($opac, $user, $branch, $selected or so) which would
require to memoize whole structure again and again.

This is now we are using our $cache variables now.
I propose to move all of them to Koha::Persistent so we can do
full_size on that class to get memory usage and enable correct
invalidation in single place (per-request) or have proper invalidation
functions.

This is also a reason why sql_cache function tries to create proper
multi-level hash with cached values instead of just memoizing
$sth->fetchrow_hashref.

With proper invalidation those values might be stored in shared memory
so that all plack threads have access to them.

4) performance patches

Non-caching related but still important, like missing indexes or
my favorite example of this (so far):

Bug 7846 - get_batch_summary reimplements GROUP BY in perl code
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7846

(which needs SO :-)

5) testing and statistics

I do intend to run this code in production. To be honest, plack didn't
bright all performance improvements I was hoping for, and I think that
we can fix it in 3.10 release cycle (search page in my favorite one to
test with).

To achieve that, I will collect statistical data about cache usage
(hit/miss). I found it very valuable for developing so far and it's always
nice to have some idea how cache is performing.

I must say that developing under plack is a joy: fast page load time is
nice and you get used to it quite quickly, so slow parts of code pop up
even without DBIProfile or NYTProf :-)

-- 
Dobrica Pavlinusic               2share!2flame            dpavlin at rot13.org
Unix addict. Internet consultant.             http://www.rot13.org/~dpavlin