[Koha-devel] RFC: Koha::Persistent - for plack and general performance

Fri Apr 6 10:14:58 CEST 2012

On 5 April 2012 23:20, Paul Poulain <paul.poulain at biblibre.com> wrote:
> Le 04/04/2012 17:48, Dobrica Pavlinusic a écrit :
>> I would like to propose new module which would keep track of all
>> persistant values for rest of Koha code.
> Hi Dobrika,
>
>> As opposed to C4::Cache* which never got implemented in all code,
>> I propose new Koha::Persistant[1] module which provides methods for caching
>> designed in such a way to reduce code size in modules which use it,
>> see [2] for authorised_value and [3] for marc_subfield_structure
> I agree a good cache handling is needed, and we need to think about
> cache/persistence with plack in our mind.
> How Koha works currently in CGI mode :
> * a page is called / Perl compiler is loaded / script is compiled and run
> * the script calls some subs that does SQL calls
> * if sub A needs something in authorized_values, a SQL request is
> issued, then if sub B needs something in authorized_values, maybe the
> same SQL request is issued again
>
> The "ideal" situation:
> * a page is called
> * plack has it already in one thread
> * the script calls some sub that need authorized_values, that are
> available without any SQL request -caching-
>
> Your work with Koha::Persistent is a step in the proper direction, but I
> think it's not the best we should/have
>
> We discussed, a long time ago, of splitting .pm in 2 parts: Data
> Accessors and Business Logic. Everybody agreed it's the way we want to
> go, but for now, it's only a theory, nothing has been made in this
> direction.
>
> The code could be rewritten this way:
> Koha::DataAccess::authorized_values.pm is the ONLY way to retrieve
> authorized values.
> This package contains something like
>  authorized_values->get_value($category, $authorized_value)
> that returns the result of
>  SELECT * FROM authorized_values WHERE category=? and authorized_value=?
>
> the get_value is Memoized => It means it's cached.
>
> I've investigated Memoize, and have found 2 interesting things:
> * there's a facility called "flush_cache":
>>        "flush_cache(function)" will flush out the caches, discarding all the cached data.  The argument may be a
>>        function name or a reference to a function.  For finer control over when data is discarded or expired, see the
>>        documentation for "Memoize::Expire", included in this package.
>>
>>        Note that if the cache is a tied hash, "flush_cache" will attempt to invoke the "CLEAR" method on the hash.  If
>>        there is no "CLEAR" method, this will cause a run-time error.
>>
>>        An alternative approach to cache flushing is to use the "HASH" option (see above) to request that "Memoize" use
>>        a particular hash variable as its cache.  Then you can examine or modify the hash at any time in any way you
>>        desire.  You may flush the cache by using "%hash = ()".
>
> There's also a Memoize::Expire package that could let us define
> expiration rules for memoized functions.
>
> Why is this not so interesting in CGI mode ? because the CGI mode make
> everything die after each request. It means it's quite probable that a
> memoized function will be called only a few times, usually 1, on each
> call, the the script dies, all what is memoized is gone.
>
> The memcached is a workaround according to me: it put the result in a
> memcached server, so doesn't care of the fact that the CGI dies, but the
> price of reaching the page is very high, limiting the gain (chris_c said
> many times that memcached is not a matter of performance, but of
> scalability)
>
I disagree, it is not a workaround, centralised caching is vital.
Whether it be memcached, reddis, or true globals. What we cannot have
is the situation where different threads are using different values.
Also the memoize_memcached module has an issue where it works twice as
hard as it needs to, Dobrika has a patch for this.

> === Flushing the cache in a Plack environment ===
> For now, by default, plack threads have a 30 pages duration. If you run
> starman with 6 threads, it means, if I understand well, that a change in
> a cached data could be ignored for *at most* 180 pages. That's not a big
> deal, but:

It is a big deal, because its 6 threads .. with 30 requests and those
threads may be returning different values, unless they are using a
shared cached (or shared memory).

> * that's a deal
> * if we want to continue improve performances, the best goal would be to
> be able to serve 1000 pages before dying
>
> How could be address this problem ? I haven't found how to "flush
> thread" in plack, if there's a way to do it, i'll be happy to learn ;-)
>
> If there is one, we could split the Koha::DataAccess in 2 sub-parts:
> parameters and non parameters. Like
> Koha::DataAccess::Parameter::Authorized_value->get_...
>
> Any change in one of the value meaning "hey, we must flush plack because
> we must flush all memoized values"
> OTOH, if we do that *each time* there is a value changed, that will be
> very CPU consuming, a better way would be to have something like:
> * if something has been updated on admin-home.pl page, rise a flag that:
>  - warn the user who tries to exit admin-home.pl without restarting the
> plack
>  - add a link to restart Plack.
>
>
> The more I think of it, the more I like it: that would be a first step
> in the direction of business logic / data access, plus an interesting
> boost in performance and scalability.

I dislike this, we should solve the problem properly not introduce
hacks to get around the fact we have done a bad job of engineering,
that's just technical debt we have to pay back.

Having worked on projects with huge load, and persistance, I have
learnt proper caching and cache invalidation is vitally important and
getting it right at the start is a lot easier than trying to do it at
the end.

>
> == SQL load ==
> this discussion is also related to "Performance issues with mysql and
> Koha on 2 physical servers"
>
> == Conclusion ==
> My preference strongly goes to a long-term solution, and the
> Koha::DataAccess::Authorized_value is more long term that your
> proposition. So I would favor it. If someone can/want to throw an
> alternate long-term proposition, feel free !
>
I think Dobrika is on the right path, we should build a proper caching
framework first, one that wont cause data inconsistencies under
persistance, your Koha::DataAccess::Authorized_value subroutine could
then use that.

Chris

> If I get some positive feedback, I'll try to throw some code soon ;-)
> (but with the 3.8 release in 2 weeks, that will probably be in may !)
> --
> Paul POULAIN
> http://www.biblibre.com
> Expert en Logiciels Libres pour l'info-doc
> Tel : (33) 4 91 81 35 08
> _______________________________________________
> Koha-devel mailing list
> Koha-devel at lists.koha-community.org
> http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> website : http://www.koha-community.org/
> git : http://git.koha-community.org/
> bugs : http://bugs.koha-community.org/