[Koha-zebra] Differently index same field numbers from different record types

Thomas Dukleth kohazebra at agogme.com
Fri Aug 4 12:23:37 CEST 2006


This issue revisits an indexing problem related to the problem which
appeared in the thread "[Zebralist] how to index everything ?"

Sebastian Hammer wrote:
> Hi Paul,
>
> I don't know if this helps, but if you add the line 'xpath enable' to
> your .abs file, Zebra will build additional index structures to enable
> searches like:
>
> Z> find @attr 1=/*/title someterm
>
> What is supported is a subset of the XPATH spec, but I *think* you can do:
>
> Z> find  @attr 1=/*/datafield[@tag='245'] someterm
>
> In other words, XPATH-statements are used to select elements for
> searching, as an alternative to numerical USE attributes.
>
> Performance is not quite as good as for the regular indexes, so it's not
> something you want to do a lot in production on a 10M record database...
> but it's fine for smaller applications.

Unlike the issue presented in the earlier thread, this issue requires high
performance.

Sebastian Hammer wrote:
> Hmm. You can speed things up by having a specialized tag index.
>
> something like
>
> xelem /record/datafield/@tag tag
>
> in your abs file.
>
> then you can query something like
>
> Z> find  @and @attr 1=tag '245' @attr 1=/*/datafield/subfield[code='9']
> someterm
>
> to speed things up a bit.
>
>
> You could also define an index for each combination of tag/subfields,
> but that might be an administration nightmare.

Sebastian Hammer wrote:
> That wouldn't work out of the box. But this 'should work' (haven't tried
> it):
>
> Z> find @attr 1=/*/datafield[@tag='245']/subfield[@code='a'] someterm

Maybe we will need an administration nightmare to have the system function
as needed.


1.  INDEXING PROBLEM.

We need to be able to differently index fields with the same field number
from different record types differently.  How can different indexing for
the same field number be accomplished without storing them in separate
databases?

Record type can be distinguished by the value of 000/06 but I am uncertain
that will help properly in all circumstances where we do actually want to
search across multiple record types when the records are related.


2.  MARC CONFLICT EXAMPLES.

 have not inspected well to consider all the cases risking false results
if the record types are not distinguished well.

2.1.  FIXED LENGTH FIELD CASE.

I have always believed that the basic fixed length data elements fields
need local use field analogues with appropriate values to ease searching
because record type and even bibliographic level within a record type
changes the meaning of fixed length data elements.  MARC 21 008 and
UNIMARC 100 have this variance problem.

Supplementary local use fields might be a reasonable choice for solving
other problems in the case of the fixed length data elements fields.


2.2.  A MARC 21 CASE.

If we have MARC 21 bibliographic records with 500, general note, and also
MARC 21 authority records with 500, see also from tracing--personal name;
how can we index them differently?


2.3.  A UNIMARC CASE.

If we have UNIMARC bibliographic records with 200, title and statement of
responsibility, and also UNIMARC authorities records with 200,
heading--personal name; how can we index them differently?


3.  XML META RECORDS EXAMPLES.

We have been considering using XML meta-records to overcome the problem of
needing to index related records together.

The records may have a structure like the following simplified possibility.

<collection >
    <bibliographic_record>
        <related_authority_records>
        </related_authority_records>
        <related_holdings_records>
        </related_holdings_records>
    </bibliographic_record>
</collection>


3.1.  PATH ELEMENT DIFFERENCE.

How can we a use a path element difference or even an attribute difference
to have fields of different record record types indexed differently?

<collection id="1">
    <bibliographic_record>
        <record>
            <datafield tag="500" ind1=" " ind2=" ">

<collection id="1">
    <bibliographic_record>
        <record>
            <related authorities records>
                <record>
                    <datafield tag="500" ind1="1" ind2=" ">


3.2.  PATH MINOR ATTRIBTE DIFFERENCE.

How can we a use a path element minor attribute difference to have fields
of different record record types indexed differently?

<record type="Bibliographic">
    <datafield tag="500" ind1=" " ind2=" ">

<record type="Authority">
    <datafield tag="500" ind1="1" ind2=" ">


Thomas Dukleth
Agogme
109 E 9th Street, 3D
New York, NY  10003
USA
http://www.agogme.com
212-674-3783







More information about the Koha-zebra mailing list