[Koha-bugs] [Bug 19884] Improve performance of GetItem

Thu Jan 4 15:28:20 CET 2018

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19884

--- Comment #29 from David Gustafsson <glasklas at gmail.com> ---
Thanks, I looked through the code. My general feeling is it would be better for
an optimized more low level interface to just fetch the data still would reside
under the Koha::Objects namespace, since the table metadata should be available
there, and having to manage basically the same meta-data in multiple places
creates a possible source for bugs. I can perhaps have a look at how feasible
it would be to introduce a general search_unblessed for Koha::Objects (as
suggested), using the available metadata, and introducing some new metadata for
search conditions for example. In my opinion this still might be overkill since
items are probably the only Koha entity in need of this heavy optimization.
Creating a special case (with admittedly pretty ugly code) communicates more
clearly that this really is a hack, and not something that one would/should
generally use.

Regarding performance and DBMS cache, I did initially benchmark this with
production data, but decided to provide a benchmarking script requiring only
one biblio with items so it would be easier for someone else to reproduce.
There might be a difference in results, but not depending on DBMS cache. I
think the SQL-execution constitutes a very small proportion of the execution
time (when running un-cached queries with production data, not repeating the
same query. 1.84s was spent by mysql executing the query (DBI::st::execute), of
total 462s, so 0.4% of total time is spent executing SQL, the rest is
data-mangling in Perl (if I'm not missing something). There was however a 30x
difference in the results I posted earlier in this thread, and that may be an
artificial result (since I'm only getting around 10x when running with
production data).

I check the average number of items per biblio for us, and its about 1.4.

(SELECT AVG(item_count) FROM (select biblionumber, count(itemnumber) as
item_count FROM items GROUP BY biblionumber) as item_counts).

So to load items in batches of 10 is really not representative of production
data.

Running this script
(https://gist.github.com/gnucifer/84b5ede7b06f0f4103400f8c89714f52) with 2000
items, in batches of 2, for our production data takes 256s, and 49s with patch.
So 5x differance. In batches of 1 the difference should be around 8-10.

I might explore the option of instead try working with biblios in batches, and
load all the items for a batch of biblios at a time. Then the horrible
DBIx-performance should not matter much at all since there is only one search
query for items for every x number of biblios.

-- 
You are receiving this mail because:
You are watching all bug changes.