[Koha-bugs] [Bug 9828] Zebra indexes useless subfields in UNIMARC 6XX

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Thu Oct 9 21:31:42 CEST 2014


http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=9828

Paul Poulain <paul.poulain at biblibre.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #30961|0                           |1
        is obsolete|                            |
  Attachment #30962|0                           |1
        is obsolete|                            |
  Attachment #31113|0                           |1
        is obsolete|                            |

--- Comment #31 from Paul Poulain <paul.poulain at biblibre.com> ---
Created attachment 32111
  -->
http://bugs.koha-community.org/bugzilla3/attachment.cgi?id=32111&action=edit
Bug 9828: More specific indexing of UNIMARC 6XX fields

[New commit on 18 Aug 2014 : rebased, and DOM indexing only]

Issues to fix :
Most of 6XX may contain a $2 that identifies the system used for indexing. It
should not be indexed.
In French libraries, $2 contains "rameau". So searching books about the music
composer "Rameau" retreive thousands of records!
For some 6XX fiels, other subfields should not be indexed, for example dates of
persons and family, or adresses.
In Unimarc guide, 600$t,601$t,602$t are said to exist but to be "not used". I
keep them indexed.

Additionnally, subject indexing could be improved by using specific indexes for
each 6XX if possible :
In ccl.properties :
- su-to, su-geo and su-ut are defined as aliases of Subject.
- a specific index is defined, but not used in record.abs :
Subject-name-personal, alias su-na
We can use these indexes and create new specific indexes by using existing bib1
attributes.

We could also index $j,$x,$y,$z subdivision in specific indexes.

This patch does the following changes :
1) For all 6XX : Not indexing $2 (LSCH, Rameau...), $3 and $5
2) Suppressing the indexing of some specific subfields, depending on the field:
600 : Personal name used as a subject // see Marc21 600
not indexing c (additional elements),f (dates),p (address/affiliation)
602 : Family name used as a subject // see Marc21 600 3X
not indexing f (dates)
616 : Trademark
not indexing c,f
3) For all 6XX : index $j,$x,$y,$z in several indexes in addition to the
specfific index for their 6XX field:
4) Define in ccl.properties some specific indexes :
Subject-name-conference 1=1073 => alias su-conf
Subject-name-corporate 1=1074 => alias su-corp
Subject-genre-form 1=1075 => alias su-genre and su-form
Subject-geographical 1=1076 => alias su-geo
Subject-chronological 1=1077 => alias su-chrono
Subject-title 1=1078 => alias su-ut and su-ti
Subject-topical 1=1079 => alias su-to
5) Adding new aliases in Search.pm :
su-chrono, su-form, su-genre, su-corp, su-conf, su-ti
6) Using these new indexes in for
600 : Subject and Subject-Personal-Name ; all subfields except subdivisions in
Personal-name
601 : Subject, Subject-name-conference and Subject-name-corporate and
Subject-name-conf ; all subfields except subdivisions in Corporate-name and
Conference-name
602 : same as 600 but could be improved later
604 : Subject and Subject-title ; $a in Subject-Personal-Name ; all subfields
except subdivisions in Name-and-Title
605 : Subject and Subject-title
606 : Subject and Subject-topical
607 : Subject and Subject-geographical ; all subfields except subdivisions in
Name-geographic
608 : Subject and Subject-genre-form

To test :

A. In a UNIMARC-DOM indexing environment
1) Apply the patch
2) Rebuild zebra
3) Create a record A with some values in critical fields, for example:
- the string "test9828" in 600$c 600$f 600$p, 602$f, 616$c, 616$f, 606$2,600$2
- the string "subform" in 600$j
4) Create a record B with the string "subgeo" in 606$y
5) Create a record C with the string "subdate" in 606$z
6) try to search "su:test9828". You should have no results
7) try to search "su-genre:subform". You should have 1 result : record A
8) try to search "su-geo:subgeo". You should have 1 result : record B
9) try to search "su-chrono:subdate". You should have 1 result : record C
10) on existing records, try su-ut, su-to, su-na, su-form, su-corp, su-geo
indexes, and see it results are relevant

Indexing of subjects could maybe be improved later

Signed-off-by: Nick Clemens <nick at quecheelibrary.org>

All seems to work as expected, I am not super-familiar with UNIMARC but I
wonder if in su-corp and su-conf the subdivisions might be useful (e.g.
France-Gendarmie / Staatsbibliothek-Berlin)

Signed-off-by: Paul Poulain <paul.poulain at biblibre.com>

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list