[Koha-devel] Dates and Zebra

David Cook dcook at prosentient.com.au
Tue Oct 6 03:31:08 CEST 2015


Hey all:

 

I was just adding an alternate patch for
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=14861, and I
started thinking a bit about date indexes in Zebra. and how we're not using
them as effectively as we could.

 

For MARC21, we only actually have one date index and that's for
"Date-of-acquisition", even though we store lots of other ISO-formatted
dates such as "datelastseen", "replacementpricedate", "datelastborrowed".
Technically "Date/time-last-modified" is in ISO format, although we'd have
to test how Zebra handles the time element of the string. and we might want
to transform it into the extended ISO format (e.g. YYYY-MM-DD hh:mm:ss)
which is used elsewhere. "date-entered-on-file" could also be transformed
although it has the Y2K issue.

 

That said, I don't think these fields are particularly popular. 

 

I noticed that UNIMARC indexes "items.onloan" which MARC21 refers to as
952$q. That might be an interesting one. UNIMARC indexes that as a "date
index". Provided that the indexes are up-to-date, you could quickly search
for all records with items due on a particular date. and after bug 14861 is
pushed. you could also search for date ranges. 

 

Anyway, just a thought. 

 

I'm not sure how many people are familiar with ccl.properties,
biblio-zebra-indexdefs.xsl, bib1.att, default.idx, and the rest of the Zebra
configuration files, but I've included an explanation at the end of this
file regarding the MARC21 Zebra configuration.

 

--

Bib1.att defines mappings between numbers and index names. The index names
are used both during indexing (and optionally during searching/retrieval).
The numbers are used just for searching/retrieval. 

 

For example:

 

att 4    Title

 

ccl.properties creates mappings between user-defined "CCL qualifiers" and
the Bib1.att "use" attribute numbers. It also can create additional aliases
for "CCL qualifiers".

 

For example:

 

Title 1=4

ti Title

 

In this case, "Title" is the "CCL qualifier" and "ti" is an alias for
"Title". 1 stands for "Use attribute" and 4 stands for "att 4" from
Bib1.att.

 

biblio-zebra-indexdefs.xsl parses MARCXML records and transforms them into a
format that Zebra uses for indexing. It uses the "index names" from the
Bib1.att file for this purpose. 

 

Here's a snippet from that file for MARC21:

 

  <xslo:template mode="index_subfields" match="marc:datafield[@tag='245']">

    <xslo:for-each select="marc:subfield">

      <xslo:if test="contains('a', @code)">

        <z:index name="Title-cover:w Title-cover:p Title-cover:s Title:w
Title:p Title:s">

          <xslo:value-of select="."/>

        </z:index>

      </xslo:if>

    </xslo:for-each>

 

As we can see, field '245' subfield 'a' gets stored in the "Title-cover" and
"Title" indexes. But they also have those ":w", ":p", and ":s" bits added.
Those refer to "register types".

 

"w" is the word register, "p" is the phrase register, and "s" is the sort
register. There are others which you can find in "default.idx". There others
include "n" for numeric, "d" for date, and "y" for year. 

 

These "registers" are mapped to Bib-1 "Structure" attributes. These mappings
can be found at
http://www.indexdata.com/zebra/doc/querymodel-zebra.html#querymodel-pqf-apt-
mapping.

 

So if you want to search the "date" register for a particular index, you'll
need to form a CCL query that uses "st-date-normalized" since it's mapped to
"4=5" (ie structure attribute number 5), since this is the structure that is
mapped  to the "date" register in Zebra. Typically, the default structure in
Zebra is "phrase", although Koha uses some query building "magic" to often
use the "word-list" structure, which is mapped to both the "word" and
"phrase" registers. 

 

Thus, "acqdate, st-date-normalized = 2015-09-01" will search the
"Date-of-acquisition" index (since "acqdate" is just an alias for the
"Date-of-acquisition CCL qualifier which is mapped to the
"Date-of-acquisition" index, and st-date-normalized is mapped to the "date"
register).

 

The patch for 14861 will also make it so that you can search ISO date ranges
such as: "acqdate, st-date-normalized = 2015-09-01 - 2015-09-30" or
"acqdate, st-date-normalized = 2014 - 2015" or even "acqdate,
st-date-normalized = 2014-01 - 2014-03" . Note that the whitespace between
the digital and the hyphen are significant.

 

For more fun with Zebra, consider reading some of the content at
http://wiki.koha-community.org/wiki/Troubleshooting_Zebra.

 

 

 

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St, Ultimo, NSW 2007

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20151006/edb63000/attachment.html>


More information about the Koha-devel mailing list