[Koha-zebra] Re: Import Speed
Sebastian Hammer
quinn at indexdata.com
Thu Mar 2 22:42:04 CET 2006
Joshua,
Done right, a first-time update of 5000 records ought to take less than
a minute, so there is definitely room for improvement.
The big question in my mind is whether the network interface as it
stands is suitable for bulk updates.. we might need Adam's input on
that. The primary problem is not so much the XML as the fact that we are
updating records one at a time, which is Ok if that's what you mean to
do, but it's terrible if you mean to update things in bulk.
--Seb
Joshua Ferraro wrote:
>On Thu, Mar 02, 2006 at 04:40:16PM +0000, Mike Taylor wrote:
>
>
>>>Date: Thu, 2 Mar 2006 07:44:22 -0800
>>>From: Joshua Ferraro <jmf at liblime.com>
>>>
>>>
>>>
>>There's your culprit, then. You're spending 39751 of your 40604
>>seconds doing needless searches, and 853 seconds (14 minutes) doing
>>the actual updates. Rip out the searches and you should get a 47-fold
>>speed increase.
>>
>>Why are you doing the search? So far I can see, it's just a probe to
>>see whether the connection is still alive. But you don't need to do
>>that: just go ahead and submit the update request, you'll find out
>>soon enough if the connection's dead and you can re-forge it then if
>>necessary.
>>
>>
>Here's what the connection manager looks like now:
>
> if (defined($context->{"Zconn"})) {
> $Zconn = $context->{"Zconn"};
> return $context->{"Zconn"};
> } else {
> $context->{"Zconn"} = &new_Zconn();
> return $context->{"Zconn"};
> }
>So ... no search ... if one is defined it just returns it and if
>it's not alive I assume the app will just crash (no fault tolerance
>built into the script).
>
>And here's the new benchmark for those 5000 records:
>
>5000 MARC records imported in 7727.84231996536 seconds
>
>dprofpp tmon.out Exporter::export_ok_tags has -1 unstacked calls in outer
>AutoLoader::AUTOLOAD has -1 unstacked calls in outer
>Exporter::Heavy::heavy_export has 12 unstacked calls in outer
>bytes::AUTOLOAD has -1 unstacked calls in outer
>Exporter::Heavy::heavy_export_ok_tags has 1 unstacked calls in outer
>POSIX::__ANON__ has 1 unstacked calls in outer
>POSIX::load_imports has 1 unstacked calls in outer
>Exporter::export has -12 unstacked calls in outer
>utf8::AUTOLOAD has -1 unstacked calls in outer
>utf8::SWASHNEW has 1 unstacked calls in outer
>Storable::thaw has 1 unstacked calls in outer
>bytes::length has 1 unstacked calls in outer
>POSIX::AUTOLOAD has -2 unstacked calls in outer
>Total Elapsed Time = 6617.861 Seconds
> User+System Time = 706.1013 Seconds
>Exclusive Times
>%Time ExclSec CumulS #Calls sec/call Csec/c Name
> 21.4 151.3 817.46 103492 0.0001 0.0008 MARC::Charset::marc8_to_utf8
> 18.0 127.3 416.36 126313 0.0000 0.0000 MARC::Charset::Table::get_code
> 17.1 121.0 121.08 126295 0.0000 0.0000 Storable::mretrieve
> 10.9 77.27 0.000 126295 0.0000 0.0000 Storable::thaw
> 10.1 71.52 71.521 126313 0.0000 0.0000 SDBM_File::FETCH
> 8.42 59.48 117.80 252590 0.0000 0.0000 Class::Accessor::__ANON__
> 8.26 58.31 58.317 252590 0.0000 0.0000 Class::Accessor::get
> 7.21 50.88 467.25 126313 0.0000 0.0000 MARC::Charset::Table::lookup_by_ma
> 1 rc8
> 6.15 43.39 97.718 126295 0.0000 0.0000 MARC::Charset::Code::char_value
> 4.87 34.35 34.354 126295 0.0000 0.0000 MARC::Charset::_process_escape
> 2.71 19.10 19.101 126313 0.0000 0.0000 MARC::Charset::Table::db
> 2.26 15.98 30.245 728288 0.0000 0.0000 MARC::Record::field
> 2.10 14.79 14.794 802346 0.0000 0.0000 MARC::Field::tag
> 1.94 13.69 857.27 25241 0.0005 0.0340 MARC::File::XML::record
> 1.44 10.15 11.456 714137 0.0000 0.0000 MARC::Field::subfields
>
>So it's definitely better without the search, but there is still
>the question of XML ... being able to import raw marc (which would
>only take a few seconds) would be really nice ...
>
>
>
>>(Mind you, 14 minutes still seems very slow for 5000 poxy records. I
>>think there are bulk-update cache issues going on here as well.)
>>
>>
>
>
>
--
Sebastian Hammer, Index Data
quinn at indexdata.com www.indexdata.com
Ph: (603) 209-6853
More information about the Koha-zebra
mailing list