[Koha-bugs] [Bug 9828] Zebra indexes useless subfields in UNIMARC 6XX

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Mon Aug 18 23:02:05 CEST 2014


http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=9828

--- Comment #14 from mathieu saby <mathsabypro at gmail.com> ---

[New commit on 18 Aug 2014 : rebased, and DOM indexing only]

Issues to fix : 
Most of 6XX may contain a $2 that identifies the system used for indexing. It
should not be indexed.
In French libraries, $2 contains "rameau". So searching books about the music
composer "Rameau" retreive thousands of records!
For some 6XX fiels, other subfields should not be indexed, for example dates of
persons and family, or adresses.
In Unimarc guide, 600$t,601$t,602$t are said to exist but to be "not used". I
keep them indexed.

Additionnally, subject indexing could be improved by using specific indexes for
each 6XX if possible :
In ccl.properties :
- su-to, su-geo and su-ut are defined as aliases of Subject.
- a specific index is defined, but not used in record.abs :
Subject-name-personal, alias su-na
We can use these indexes and create new specific indexes by using existing bib1
attributes.

We could also index $j,$x,$y,$z subdivision in specific indexes.

This patch does the following changes :
1) For all 6XX : Not indexing $2 (LSCH, Rameau...), $3 and $5
2) Suppressing the indexing of some specific subfields, depending on the field:
600 : Personal name used as a subject // see Marc21 600
not indexing c (additional elements),f (dates),p (address/affiliation)
602 : Family name used as a subject // see Marc21 600 3X
not indexing f (dates)
616 : Trademark
not indexing c,f
3) For all 6XX : index $j,$x,$y,$z in several indexes in addition to the
specfific index for their 6XX field:
4) Define in ccl.properties some specific indexes :
Subject-name-conference 1=1073 => alias su-conf
Subject-name-corporate 1=1074 => alias su-corp
Subject-genre-form 1=1075 => alias su-genre and su-form
Subject-geographical 1=1076 => alias su-geo
Subject-chronological 1=1077 => alias su-chrono
Subject-title 1=1078 => alias su-ut and su-ti
Subject-topical 1=1079 => alias su-to
5) Adding new aliases in Search.pm :
su-chrono, su-form, su-genre, su-corp, su-conf, su-ti
6) Using these new indexes in for
600 : Subject and Subject-Personal-Name ; all subfields except subdivisions in
Personal-name
601 : Subject, Subject-name-conference and Subject-name-corporate and
Subject-name-conf ; all subfields except subdivisions in Corporate-name and
Conference-name
602 : same as 600 but could be improved later
604 : Subject and Subject-title ; $a in Subject-Personal-Name ; all subfields
except subdivisions in Name-and-Title
605 : Subject and Subject-title
606 : Subject and Subject-topical
607 : Subject and Subject-geographical ; all subfields except subdivisions in
Name-geographic
608 : Subject and Subject-genre-form

To test :

A. In a UNIMARC-DOM indexing environment
1) Apply the patch
2) Rebuild zebra
3) Create a record A with some values in critical fields, for example:
- the string "test9828" in 600$c 600$f 600$p, 602$f, 616$c, 616$f, 606$2,600$2
- the string "subform" in 600$j
4) Create a record B with the string "subgeo" in 606$y
5) Create a record C with the string "subdate" in 606$z
6) try to search "su:test9828". You should have no results
7) try to search "su-genre:subform". You should have 1 result : record A
8) try to search "su-geo:subgeo". You should have 1 result : record B
9) try to search "su-chrono:subdate". You should have 1 result : record C
10) on existing records, try su-ut, su-to, su-na, su-form, su-corp, su-geo
indexes, and see it results are relevant

Indexing of subjects could maybe be improved later

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list