[Koha-zebra] Re: Import Speed
Sebastian Hammer
quinn at indexdata.com
Fri Mar 3 16:49:17 CET 2006
Joshua Ferraro wrote:
>On Fri, Mar 03, 2006 at 09:04:48AM +0000, Mike Taylor wrote:
>
>
>>Hmm. Well, compared with the previous truly astonishing time of 40604
>>seconds, that's a better than fivefold improvement, which is not a bad
>>start. But, still -- more than one second a record, we still have
>>_plenty_ of scope for improvement here.
>>
>>How busy is your disk now?
>>
>>
>It's a remote machine ... do you have suggestions for a utility that
>measures disc usage on the fly?
>
>
>
>>>So it's definitely better without the search, but there is still the
>>>question of XML ... being able to import raw marc (which would only
>>>take a few seconds) would be really nice ...
>>>
>>>
>>I agree with Seb that the XML is unlikely to be culprit here: the
>>actual indexing is the only thing I can think of that would show the
>>pattern you see of taking longer as the database grows.
>>
>>
>OK ... but if you look back at that benchmark, the majority of our
>time is now spent converting from marc21 to MARCXML (it seems the
>most proc intensive part of this is the conversion from MARC-8
>encoding to UTF-8). So even if Zebra is quite fast indexing XML,
>we still have quite a bit of overhead getting the records into
>XML. I suppose I should do a test where I pre-process the records
>(convert from MARC to XML) and _then_ import. Whadya think?
>
>
If that really is the case, we should probably look more aggressively
into enabling Zebra to import MARC directly via the network interface. I
don't know what the issues are, but it must be doable.
That being said, I find it nearly incomprehensible that a mapping from
MARC to MARCXML or MARC8 to UTF-8 should be as demanding as these
numbers indicate.
--Seb
>Cheers,
>
>
>
--
Sebastian Hammer, Index Data
quinn at indexdata.com www.indexdata.com
Ph: (603) 209-6853
More information about the Koha-zebra
mailing list