[Koha-zebra] Re: Import Speed

Sebastian Hammer quinn at indexdata.com
Thu Mar 2 22:42:04 CET 2006


Joshua,

Done right, a first-time update of 5000 records ought to take less than 
a minute, so there is definitely room for improvement.

The big question in my mind is whether the network interface as it 
stands is suitable for bulk updates.. we might need Adam's input on 
that. The primary problem is not so much the XML as the fact that we are 
updating records one at a time, which is Ok if that's what you mean to 
do, but it's terrible if you mean to update things in bulk.

--Seb

Joshua Ferraro wrote:

>On Thu, Mar 02, 2006 at 04:40:16PM +0000, Mike Taylor wrote:
>  
>
>>>Date: Thu, 2 Mar 2006 07:44:22 -0800
>>>From: Joshua Ferraro <jmf at liblime.com>
>>>
>>>      
>>>
>>There's your culprit, then.  You're spending 39751 of your 40604
>>seconds doing needless searches, and 853 seconds (14 minutes) doing
>>the actual updates.  Rip out the searches and you should get a 47-fold
>>speed increase.
>>
>>Why are you doing the search?  So far I can see, it's just a probe to
>>see whether the connection is still alive.  But you don't need to do
>>that: just go ahead and submit the update request, you'll find out
>>soon enough if the connection's dead and you can re-forge it then if
>>necessary.
>>    
>>
>Here's what the connection manager looks like now:
>
>        if (defined($context->{"Zconn"})) {
>                $Zconn = $context->{"Zconn"};
>                return $context->{"Zconn"};
>        } else {
>                $context->{"Zconn"} = &new_Zconn();
>                return $context->{"Zconn"};
>                }
>So ... no search ... if one is defined it just returns it and if
>it's not alive I assume the app will just crash (no fault tolerance
>built into the script).
>
>And here's the new benchmark for those 5000 records:
>
>5000 MARC records imported in 7727.84231996536 seconds
>
>dprofpp tmon.out                           Exporter::export_ok_tags has -1 unstacked calls in outer
>AutoLoader::AUTOLOAD has -1 unstacked calls in outer
>Exporter::Heavy::heavy_export has 12 unstacked calls in outer
>bytes::AUTOLOAD has -1 unstacked calls in outer
>Exporter::Heavy::heavy_export_ok_tags has 1 unstacked calls in outer
>POSIX::__ANON__ has 1 unstacked calls in outer
>POSIX::load_imports has 1 unstacked calls in outer
>Exporter::export has -12 unstacked calls in outer
>utf8::AUTOLOAD has -1 unstacked calls in outer
>utf8::SWASHNEW has 1 unstacked calls in outer
>Storable::thaw has 1 unstacked calls in outer
>bytes::length has 1 unstacked calls in outer
>POSIX::AUTOLOAD has -2 unstacked calls in outer
>Total Elapsed Time = 6617.861 Seconds
>  User+System Time = 706.1013 Seconds
>Exclusive Times
>%Time ExclSec CumulS #Calls sec/call Csec/c  Name
> 21.4   151.3 817.46 103492   0.0001 0.0008  MARC::Charset::marc8_to_utf8
> 18.0   127.3 416.36 126313   0.0000 0.0000  MARC::Charset::Table::get_code
> 17.1   121.0 121.08 126295   0.0000 0.0000  Storable::mretrieve
> 10.9   77.27  0.000 126295   0.0000 0.0000  Storable::thaw
> 10.1   71.52 71.521 126313   0.0000 0.0000  SDBM_File::FETCH
> 8.42   59.48 117.80 252590   0.0000 0.0000  Class::Accessor::__ANON__
> 8.26   58.31 58.317 252590   0.0000 0.0000  Class::Accessor::get
> 7.21   50.88 467.25 126313   0.0000 0.0000  MARC::Charset::Table::lookup_by_ma
>                   1                         rc8
> 6.15   43.39 97.718 126295   0.0000 0.0000  MARC::Charset::Code::char_value
> 4.87   34.35 34.354 126295   0.0000 0.0000  MARC::Charset::_process_escape
> 2.71   19.10 19.101 126313   0.0000 0.0000  MARC::Charset::Table::db
> 2.26   15.98 30.245 728288   0.0000 0.0000  MARC::Record::field
> 2.10   14.79 14.794 802346   0.0000 0.0000  MARC::Field::tag
> 1.94   13.69 857.27  25241   0.0005 0.0340  MARC::File::XML::record
> 1.44   10.15 11.456 714137   0.0000 0.0000  MARC::Field::subfields
>
>So it's definitely better without the search, but there is still
>the question of XML ... being able to import raw marc (which would 
>only take a few seconds) would be really nice ...
>
>  
>
>>(Mind you, 14 minutes still seems very slow for 5000 poxy records.  I
>>think there are bulk-update cache issues going on here as well.)
>>    
>>
>
>  
>

-- 
Sebastian Hammer, Index Data
quinn at indexdata.com   www.indexdata.com
Ph: (603) 209-6853






More information about the Koha-zebra mailing list