[Koha-devel] RFC: Koha::Persistent - for plack and general performance

Fri Apr 6 12:29:09 CEST 2012

Would it be practical and efficient to cache shared data to file (or a
series of files)?  My understanding is that reading files is faster than
doing SQL queries, but not as fast as accessing memcached.  But, the file
would have the advantage of being non-threaded.  I don't think
configuration things like sysprefs, frameworks or issuing rules would
change frequently enough for us to need to worry about concurrent writes,
but I suppose that should also be factored in...

-Ian

On Fri, Apr 6, 2012 at 06:23, Paul Poulain <paul.poulain at biblibre.com>wrote:

> Le 06/04/2012 10:14, Chris Cormack a écrit :
> >> The memcached is a workaround according to me: it put the result in a
> >> memcached server, so doesn't care of the fact that the CGI dies, but the
> >> price of reaching the page is very high, limiting the gain (chris_c said
> >> many times that memcached is not a matter of performance, but of
> >> scalability)
> >>
> > I disagree, it is not a workaround, centralised caching is vital.
> I agree with this.
>
> > Whether it be memcached, reddis, or true globals. What we cannot have
> > is the situation where different threads are using different values.
> I agree with this. My main concern is to define our goal properly.
> What do we want ?
> I think it's:
>  * caching techniques as efficient as possible, to reduce the SQL
> overhead. We can also probably use caching techniques to reduce XSLT
> processing overhead (for libraries that have XSLT activated)
>  * manage the multi-threading problem
>  * centralized code in Koha, for a better modularity
> do we agree on those goals ?
>
> >> === Flushing the cache in a Plack environment ===
> >> For now, by default, plack threads have a 30 pages duration. If you run
> >> starman with 6 threads, it means, if I understand well, that a change in
> >> a cached data could be ignored for *at most* 180 pages. That's not a big
> >> deal, but:
> >
> > It is a big deal, because its 6 threads .. with 30 requests and those
> > threads may be returning different values, unless they are using a
> > shared cached (or shared memory).
> My english was probably incorrect here. I just wanted to say that 180 is
> not a big number (for a library having 100 000 hits per day), but it's a
> number to deal with.
>
> >> The more I think of it, the more I like it: that would be a first step
> >> in the direction of business logic / data access, plus an interesting
> >> boost in performance and scalability.
> >
> > I dislike this, we should solve the problem properly not introduce
> > hacks to get around the fact we have done a bad job of engineering,
> > that's just technical debt we have to pay back.
> OK you dislike it, and I understand your argument. What would be a
> proper technique then ?
>
> > Having worked on projects with huge load, and persistance, I have
> > learnt proper caching and cache invalidation is vitally important and
> > getting it right at the start is a lot easier than trying to do it at
> > the end.
> Agreed !
>
> > I think Dobrika is on the right path, we should build a proper caching
> > framework first, one that wont cause data inconsistencies under
> > persistance, your Koha::DataAccess::Authorized_value subroutine could
> > then use that.
> Maybe there's something I'm missing, but I think what Dobrica (with a c,
> you're right, I don't know why I always write it with a k !) proposal
> don't manage cache invalidation. It appear to me as a shared memory (our
> $_cache) that contains $_cache->{$key} = $value, but I don't see a way
> to reset the cache. Am I missing something ?
> My feeling is that Dobrica proposal has exactly the problem you're
> describing (as my proposal has)
>
>
> As you're experimented in caching, a question: I think cache
> invalidation can be made by 2 different techniques:
> * when a data make the cache invalid, it send a message to all cache to
> say "hey, cache is reseted". Cache is reseted, and next time a data is
> requested it won't be in the cache.
> * when a data make the cache invalid, it just set a flag saying "the
> cache is invalid", and everytime a data is requested, the cache handler
> check if the cache must be reseted, and if it must, reset and re-load
> the real data.
>
> Am I right if I say that the 1st technique is the best and is what is
> achieved by ->flush_cache() method of (memoize::)memcached ?
>
> Other question: I see in memcached that the server can be
> /var/sock/memcached. Is it possible that it would be faster than a full
> IP addresse (for libraries that would have the memcache server on the
> same server as the Koha server) ?
> --
> Paul POULAIN
> http://www.biblibre.com
> Expert en Logiciels Libres pour l'info-doc
> Tel : (33) 4 91 81 35 08
> _______________________________________________
> Koha-devel mailing list
> Koha-devel at lists.koha-community.org
> http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
> website : http://www.koha-community.org/
> git : http://git.koha-community.org/
> bugs : http://bugs.koha-community.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/koha-devel/attachments/20120406/622e6c12/attachment.htm>