[Koha-zebra] Re: Import Speed

Sebastian Hammer quinn at indexdata.com
Fri Mar 3 16:49:17 CET 2006


Joshua Ferraro wrote:

>On Fri, Mar 03, 2006 at 09:04:48AM +0000, Mike Taylor wrote:
>  
>
>>Hmm.  Well, compared with the previous truly astonishing time of 40604
>>seconds, that's a better than fivefold improvement, which is not a bad
>>start.  But, still -- more than one second a record, we still have
>>_plenty_ of scope for improvement here.
>>
>>How busy is your disk now?
>>    
>>
>It's a remote machine ... do you have suggestions for a utility that
>measures disc usage on the fly?
>
>  
>
>>>So it's definitely better without the search, but there is still the
>>>question of XML ... being able to import raw marc (which would only
>>>take a few seconds) would be really nice ...
>>>      
>>>
>>I agree with Seb that the XML is unlikely to be culprit here: the
>>actual indexing is the only thing I can think of that would show the
>>pattern you see of taking longer as the database grows.
>>    
>>
>OK ... but if you look back at that benchmark, the majority of our
>time is now spent converting from marc21 to MARCXML (it seems the
>most proc intensive part of this is the conversion from MARC-8 
>encoding to UTF-8). So even if Zebra is quite fast indexing XML,
>we still have quite a bit of overhead getting the records into
>XML. I suppose I should do a test where I pre-process the records
>(convert from MARC to XML) and _then_ import. Whadya think?
>  
>
If that really is the case, we should probably look more aggressively 
into enabling Zebra to import MARC directly via the network interface. I 
don't know what the issues are, but it must be doable.

That being said, I find it nearly incomprehensible that a mapping from 
MARC to MARCXML or MARC8 to UTF-8 should be as demanding as these 
numbers indicate.

--Seb

>Cheers,
>
>  
>

-- 
Sebastian Hammer, Index Data
quinn at indexdata.com   www.indexdata.com
Ph: (603) 209-6853






More information about the Koha-zebra mailing list