[Koha-devel] RFC: Koha::Persistent - for plack and general performance
Paul Poulain
paul.poulain at biblibre.com
Thu Apr 5 13:20:06 CEST 2012
Le 04/04/2012 17:48, Dobrica Pavlinusic a écrit :
> I would like to propose new module which would keep track of all
> persistant values for rest of Koha code.
Hi Dobrika,
> As opposed to C4::Cache* which never got implemented in all code,
> I propose new Koha::Persistant[1] module which provides methods for caching
> designed in such a way to reduce code size in modules which use it,
> see [2] for authorised_value and [3] for marc_subfield_structure
I agree a good cache handling is needed, and we need to think about
cache/persistence with plack in our mind.
How Koha works currently in CGI mode :
* a page is called / Perl compiler is loaded / script is compiled and run
* the script calls some subs that does SQL calls
* if sub A needs something in authorized_values, a SQL request is
issued, then if sub B needs something in authorized_values, maybe the
same SQL request is issued again
The "ideal" situation:
* a page is called
* plack has it already in one thread
* the script calls some sub that need authorized_values, that are
available without any SQL request -caching-
Your work with Koha::Persistent is a step in the proper direction, but I
think it's not the best we should/have
We discussed, a long time ago, of splitting .pm in 2 parts: Data
Accessors and Business Logic. Everybody agreed it's the way we want to
go, but for now, it's only a theory, nothing has been made in this
direction.
The code could be rewritten this way:
Koha::DataAccess::authorized_values.pm is the ONLY way to retrieve
authorized values.
This package contains something like
authorized_values->get_value($category, $authorized_value)
that returns the result of
SELECT * FROM authorized_values WHERE category=? and authorized_value=?
the get_value is Memoized => It means it's cached.
I've investigated Memoize, and have found 2 interesting things:
* there's a facility called "flush_cache":
> "flush_cache(function)" will flush out the caches, discarding all the cached data. The argument may be a
> function name or a reference to a function. For finer control over when data is discarded or expired, see the
> documentation for "Memoize::Expire", included in this package.
>
> Note that if the cache is a tied hash, "flush_cache" will attempt to invoke the "CLEAR" method on the hash. If
> there is no "CLEAR" method, this will cause a run-time error.
>
> An alternative approach to cache flushing is to use the "HASH" option (see above) to request that "Memoize" use
> a particular hash variable as its cache. Then you can examine or modify the hash at any time in any way you
> desire. You may flush the cache by using "%hash = ()".
There's also a Memoize::Expire package that could let us define
expiration rules for memoized functions.
Why is this not so interesting in CGI mode ? because the CGI mode make
everything die after each request. It means it's quite probable that a
memoized function will be called only a few times, usually 1, on each
call, the the script dies, all what is memoized is gone.
The memcached is a workaround according to me: it put the result in a
memcached server, so doesn't care of the fact that the CGI dies, but the
price of reaching the page is very high, limiting the gain (chris_c said
many times that memcached is not a matter of performance, but of
scalability)
=== Flushing the cache in a Plack environment ===
For now, by default, plack threads have a 30 pages duration. If you run
starman with 6 threads, it means, if I understand well, that a change in
a cached data could be ignored for *at most* 180 pages. That's not a big
deal, but:
* that's a deal
* if we want to continue improve performances, the best goal would be to
be able to serve 1000 pages before dying
How could be address this problem ? I haven't found how to "flush
thread" in plack, if there's a way to do it, i'll be happy to learn ;-)
If there is one, we could split the Koha::DataAccess in 2 sub-parts:
parameters and non parameters. Like
Koha::DataAccess::Parameter::Authorized_value->get_...
Any change in one of the value meaning "hey, we must flush plack because
we must flush all memoized values"
OTOH, if we do that *each time* there is a value changed, that will be
very CPU consuming, a better way would be to have something like:
* if something has been updated on admin-home.pl page, rise a flag that:
- warn the user who tries to exit admin-home.pl without restarting the
plack
- add a link to restart Plack.
The more I think of it, the more I like it: that would be a first step
in the direction of business logic / data access, plus an interesting
boost in performance and scalability.
== SQL load ==
this discussion is also related to "Performance issues with mysql and
Koha on 2 physical servers"
== Conclusion ==
My preference strongly goes to a long-term solution, and the
Koha::DataAccess::Authorized_value is more long term that your
proposition. So I would favor it. If someone can/want to throw an
alternate long-term proposition, feel free !
If I get some positive feedback, I'll try to throw some code soon ;-)
(but with the 3.8 release in 2 weeks, that will probably be in may !)
--
Paul POULAIN
http://www.biblibre.com
Expert en Logiciels Libres pour l'info-doc
Tel : (33) 4 91 81 35 08
More information about the Koha-devel
mailing list