[Koha-devel] RFC: Koha::Persistent - for plack and general performance

Paul Poulain paul.poulain at biblibre.com
Thu Apr 5 13:20:06 CEST 2012


Le 04/04/2012 17:48, Dobrica Pavlinusic a écrit :
> I would like to propose new module which would keep track of all
> persistant values for rest of Koha code.
Hi Dobrika,

> As opposed to C4::Cache* which never got implemented in all code,
> I propose new Koha::Persistant[1] module which provides methods for caching
> designed in such a way to reduce code size in modules which use it,
> see [2] for authorised_value and [3] for marc_subfield_structure
I agree a good cache handling is needed, and we need to think about
cache/persistence with plack in our mind.
How Koha works currently in CGI mode :
* a page is called / Perl compiler is loaded / script is compiled and run
* the script calls some subs that does SQL calls
* if sub A needs something in authorized_values, a SQL request is
issued, then if sub B needs something in authorized_values, maybe the
same SQL request is issued again

The "ideal" situation:
* a page is called
* plack has it already in one thread
* the script calls some sub that need authorized_values, that are
available without any SQL request -caching-

Your work with Koha::Persistent is a step in the proper direction, but I
think it's not the best we should/have

We discussed, a long time ago, of splitting .pm in 2 parts: Data
Accessors and Business Logic. Everybody agreed it's the way we want to
go, but for now, it's only a theory, nothing has been made in this
direction.

The code could be rewritten this way:
Koha::DataAccess::authorized_values.pm is the ONLY way to retrieve
authorized values.
This package contains something like
 authorized_values->get_value($category, $authorized_value)
that returns the result of
 SELECT * FROM authorized_values WHERE category=? and authorized_value=?

the get_value is Memoized => It means it's cached.

I've investigated Memoize, and have found 2 interesting things:
* there's a facility called "flush_cache":
>        "flush_cache(function)" will flush out the caches, discarding all the cached data.  The argument may be a
>        function name or a reference to a function.  For finer control over when data is discarded or expired, see the
>        documentation for "Memoize::Expire", included in this package.
> 
>        Note that if the cache is a tied hash, "flush_cache" will attempt to invoke the "CLEAR" method on the hash.  If
>        there is no "CLEAR" method, this will cause a run-time error.
> 
>        An alternative approach to cache flushing is to use the "HASH" option (see above) to request that "Memoize" use
>        a particular hash variable as its cache.  Then you can examine or modify the hash at any time in any way you
>        desire.  You may flush the cache by using "%hash = ()".

There's also a Memoize::Expire package that could let us define
expiration rules for memoized functions.

Why is this not so interesting in CGI mode ? because the CGI mode make
everything die after each request. It means it's quite probable that a
memoized function will be called only a few times, usually 1, on each
call, the the script dies, all what is memoized is gone.

The memcached is a workaround according to me: it put the result in a
memcached server, so doesn't care of the fact that the CGI dies, but the
price of reaching the page is very high, limiting the gain (chris_c said
many times that memcached is not a matter of performance, but of
scalability)

=== Flushing the cache in a Plack environment ===
For now, by default, plack threads have a 30 pages duration. If you run
starman with 6 threads, it means, if I understand well, that a change in
a cached data could be ignored for *at most* 180 pages. That's not a big
deal, but:
* that's a deal
* if we want to continue improve performances, the best goal would be to
be able to serve 1000 pages before dying

How could be address this problem ? I haven't found how to "flush
thread" in plack, if there's a way to do it, i'll be happy to learn ;-)

If there is one, we could split the Koha::DataAccess in 2 sub-parts:
parameters and non parameters. Like
Koha::DataAccess::Parameter::Authorized_value->get_...

Any change in one of the value meaning "hey, we must flush plack because
we must flush all memoized values"
OTOH, if we do that *each time* there is a value changed, that will be
very CPU consuming, a better way would be to have something like:
* if something has been updated on admin-home.pl page, rise a flag that:
 - warn the user who tries to exit admin-home.pl without restarting the
plack
 - add a link to restart Plack.


The more I think of it, the more I like it: that would be a first step
in the direction of business logic / data access, plus an interesting
boost in performance and scalability.

== SQL load ==
this discussion is also related to "Performance issues with mysql and
Koha on 2 physical servers"

== Conclusion ==
My preference strongly goes to a long-term solution, and the
Koha::DataAccess::Authorized_value is more long term that your
proposition. So I would favor it. If someone can/want to throw an
alternate long-term proposition, feel free !

If I get some positive feedback, I'll try to throw some code soon ;-)
(but with the 3.8 release in 2 weeks, that will probably be in may !)
-- 
Paul POULAIN
http://www.biblibre.com
Expert en Logiciels Libres pour l'info-doc
Tel : (33) 4 91 81 35 08


More information about the Koha-devel mailing list