[Koha-devel] Low Zebra performance for persistent/re-used connections

David Cook dcook at prosentient.com.au
Mon Dec 17 08:06:43 CET 2018


Hi all,

 

I'm writing a custom script that uses C4::Context->Zconn to create a Zebra
connection and then I'm sending a bunch of queries. 

 

If I don't have the GATEWAY_INTERFACE environmental variable defined, that
method returns the same connection. (Note that I'm working with Unix sockets
and not TCP sockets.)

 

When re-using the same connection, my script uses 100% CPU. 

 

When using new connections for every request, my script uses 3% CPU.

 

When re-using the same connection, Zebra seems pretty fast for the first
1000-2000 queries (I didn't capture exact numbers), but performance degrades
rapidly. Around 3000 queries, Zebra is only processing about 4-6 queries per
second.  

 

When using new connections, Zebra is handling about 93 queries per second.

 

Zebra is forking a process for every connection, so it seems like the
overhead should be greater creating a new process for every request, but
performance is exponentially better when forcing new connections for each
request  than when re-using the connection.  

 

It looks like Zebra doesn't expect a single connection/worker process to
handle too many requests. Or maybe it's an issue with the ZOOM library. I
don't know. I don't know enough about the internals of either of them. 

 

So this could affect any command line tool that handles high volumes of
Zebra queries. While re-using connections makes sense in theory, it seems to
actually cause problems when talking to Zebra. 

 

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St

Ultimo, NSW 2007

Australia

 

Office: 02 9212 0899

Direct: 02 8005 0595

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20181217/894b2f9e/attachment.html>


More information about the Koha-devel mailing list