[Koha-bugs] [Bug 13706] Deduping authorities script (dedup_authorities.pl)

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Fri Oct 27 02:00:07 CEST 2023


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13706

--- Comment #18 from David Nind <david at davidnind.com> ---
It seems to work with my limited testing - if this isn't sufficient, please
change the status.

The only thing I found confusing was the message shown when run "Deleted XX
authorities" - I'm assuming there is some logic here that I don't quite get.

Basic testing notes (using KTD):

1. Go to Authorities
2. Duplicate an existing personal name authority that is used in a record
3. Note the original and duplicate authority number
4. Run the script: misc/maintenance/dedup_authorities.pl -v -a PERSO_NAME -m
date -c
5. Check the authorities in the staff interface - one is deleted and one is
kept

I did some other testing, notes as follows.

FYI 
===

Number of terms for authority types - from the script and search results in the
staff interface:

Chronological (CHRON_TERM): Script: 0 ; Search result: 0
Corporate Name (CORPO_NAME): Script: 88 ; Search result: 88
Genre/Form Term (GENRE/FORM): Script: 49 ; Search result: 49
Geographic Name (GEOGR_NAME): Script: 142 ; Search result: 142
Meeting Name (MEETI_NAME): Script: 3 ; Search result: 3
Personal Name (PERSO_NAME): Script: 650 ; Search result: 650
Topical Term (TOPIC_TERM): Script: 663 ; Search result: 663
Uniform Title (UNIF_TITLE): Script: 111 ; Search result: 111

Default/No authority type selected: Script: 1706 ; Search result: 1706

Script result - showing authority records by type:

kohadev-koha at kohadevbox:koha(bz13706)$ misc/maintenance/dedup_authorities.pl -v
-m date
RUNNING IN TEST MODE, NO CHANGES WILL BE MADE
Fetching authtypecodes...
Fetching authtypecodes done.
Deduping authtype '' 
Fetching authorities for ''... 0 authorities found
End of deduping for authtype ''
Updated 0 biblios
Deleted 0 authorities
Deduping authtype 'CHRON_TERM' 
Fetching authorities for 'CHRON_TERM'... 0 authorities found
End of deduping for authtype 'CHRON_TERM'
Updated 0 biblios
Deleted 0 authorities
Deduping authtype 'CORPO_NAME' 
Fetching authorities for 'CORPO_NAME'... 88 authorities found
End of deduping for authtype 'CORPO_NAME'
Updated 0 biblios
Deleted 0 authorities
Deduping authtype 'GENRE/FORM' 
Fetching authorities for 'GENRE/FORM'... 49 authorities found
End of deduping for authtype 'GENRE/FORM'
Updated 0 biblios
Deleted 0 authorities
Deduping authtype 'GEOGR_NAME' 
Fetching authorities for 'GEOGR_NAME'... 142 authorities found
Progression for authtype 'GEOGR_NAME': 100/142 (70.42%)
End of deduping for authtype 'GEOGR_NAME'
Updated 0 biblios
Deleted 0 authorities
Deduping authtype 'MEETI_NAME' 
Fetching authorities for 'MEETI_NAME'... 3 authorities found
End of deduping for authtype 'MEETI_NAME'
Updated 0 biblios
Deleted 0 authorities
Deduping authtype 'PERSO_NAME' 
Fetching authorities for 'PERSO_NAME'... 650 authorities found
    Malformed authority record, no heading at
misc/maintenance/dedup_authorities.pl line 172.
Progression for authtype 'PERSO_NAME': 100/650 (15.38%)
Progression for authtype 'PERSO_NAME': 200/650 (30.77%)
Progression for authtype 'PERSO_NAME': 300/650 (46.15%)
Progression for authtype 'PERSO_NAME': 400/650 (61.54%)
Progression for authtype 'PERSO_NAME': 500/650 (76.92%)
    Malformed authority record, blank heading at
misc/maintenance/dedup_authorities.pl line 176.
End of deduping for authtype 'PERSO_NAME'
Updated 0 biblios
Deleted 0 authorities
Deduping authtype 'TOPIC_TERM' 
Fetching authorities for 'TOPIC_TERM'... 663 authorities found
Progression for authtype 'TOPIC_TERM': 100/663 (15.08%)
Progression for authtype 'TOPIC_TERM': 200/663 (30.17%)
Progression for authtype 'TOPIC_TERM': 300/663 (45.25%)
Progression for authtype 'TOPIC_TERM': 400/663 (60.33%)
Progression for authtype 'TOPIC_TERM': 500/663 (75.41%)
End of deduping for authtype 'TOPIC_TERM'
Updated 0 biblios
Deleted 0 authorities
Deduping authtype 'UNIF_TITLE' 
Fetching authorities for 'UNIF_TITLE'... 111 authorities found
End of deduping for authtype 'UNIF_TITLE'
Updated 0 biblios
Deleted 0 authorities
No biblios to update


Testing using Genre/Form Term
=============================

Summary
~~~~~~~

Total before script run: 49 terms

Manually went through results in staff interface to identify duplicates:

Commedy films. 982 (deleted), 1586 (kept)
Feature films. 625 (deleted), 650 (deleted), 654 (deleted), 822 (deleted), 984
(kept), 987 (deleted)
Fiction films. 823 (deleted), 985 (kept), 988 (deleted)
Foreign films. 626 (kept), 988 (deleted)
Historical fiction. 1018 (kept), 1019 (deleted)
Video recordings for the hearing impaired. 986 (kept), 989 (deleted)
Summary = 17 terms, should only be 6 = 11 should be deleted

Expected number of terms to be deleted: 11 (result would be 38 terms left)

Results from running the script - shows 14 deleted (3 shown as deleted twice):
987, 822, 625, 625, 650, 654, 655, 988, 823, 823, 982, 982, 989, 1019

End result is the same, not sure why showing as it does.

Search results after in the staff interface: 38 
Script results after: 38

Dummy run
~~~~~~~~~

kohadev-koha at kohadevbox:koha(bz13706)$ misc/maintenance/dedup_authorities.pl -v
-a GENRE/FORM -m date 
RUNNING IN TEST MODE, NO CHANGES WILL BE MADE
Fetching authtypecodes...
Fetching authtypecodes done.
Deduping authtype 'GENRE/FORM' 
Fetching authorities for 'GENRE/FORM'... 49 authorities found
End of deduping for authtype 'GENRE/FORM'
Updated 0 biblios
Deleted 0 authorities
No biblios to update

Actual run
~~~~~~~~~~

kohadev-koha at kohadevbox:koha(bz13706)$ misc/maintenance/dedup_authorities.pl -v
-a GENRE/FORM -m date -c
Fetching authtypecodes...
Fetching authtypecodes done.
Deduping authtype 'GENRE/FORM' 
Fetching authorities for 'GENRE/FORM'... 49 authorities found
    Updated 0 biblios
    Deleting 987
    Updated 0 biblios
    Deleting 822
    Updated 0 biblios
    Deleting 625
    Updated 0 biblios
    Deleting 625
    Updated 0 biblios
    Deleting 650
    Updated 0 biblios
    Deleting 654
    Updated 0 biblios
    Deleting 655
    Updated 0 biblios
    Deleting 988
    Updated 0 biblios
    Deleting 823
    Updated 0 biblios
    Deleting 823
    Updated 0 biblios
    Deleting 982
    Updated 0 biblios
    Deleting 982
    Updated 0 biblios
    Deleting 989
    Updated 0 biblios
    Deleting 1019
End of deduping for authtype 'GENRE/FORM'
Updated 0 biblios
Deleted 14 authorities
No biblios to update

After
~~~~~~

kohadev-koha at kohadevbox:koha(bz13706)$ misc/maintenance/dedup_authorities.pl -v
-a GENRE/FORM -m date
RUNNING IN TEST MODE, NO CHANGES WILL BE MADE
Fetching authtypecodes...
Fetching authtypecodes done.
Deduping authtype 'GENRE/FORM' 
Fetching authorities for 'GENRE/FORM'... 38 authorities found
End of deduping for authtype 'GENRE/FORM'
Updated 0 biblios
Deleted 0 authorities
No biblios to update


Testing using Meeting Name authorities
======================================

Summary
~~~~~~~

1. Duplicated the Beagle Expedition (1831-1836) authority (155)
2. Added the duplicated authority to another record
3. Result expected - one authority deleted, one kept - records using authority
updated with the kept authority

Total before script run: 4 terms

Manually went through results in staff interface to identify duplicates:

Beagle Expedition (1831-1836): 155 (deleted), 1708 (kept)
Summary = 2 terms, should only be 1 = 1 should be deleted

Expected number of terms to be deleted: 1 (result would be 3 terms left)

Results from running the script - shows 2 deleted (1 shown as deleted twice):
155, 155

End result is the same, not sure why showing as it does.

Search results after in the staff interface: 3
Script results after: 3

Dummy run
~~~~~~~~~

kohadev-koha at kohadevbox:koha(bz13706)$ misc/maintenance/dedup_authorities.pl -v
-a MEETI_NAME -m date
RUNNING IN TEST MODE, NO CHANGES WILL BE MADE
Fetching authtypecodes...
Fetching authtypecodes done.
Deduping authtype 'MEETI_NAME' 
Fetching authorities for 'MEETI_NAME'... 4 authorities found
End of deduping for authtype 'MEETI_NAME'
Updated 0 biblios
Deleted 0 authorities
No biblios to update

Actual run
~~~~~~~~~~

kohadev-koha at kohadevbox:koha(bz13706)$ misc/maintenance/dedup_authorities.pl -v
-a MEETI_NAME -m date -c
Fetching authtypecodes...
Fetching authtypecodes done.
Deduping authtype 'MEETI_NAME' 
Fetching authorities for 'MEETI_NAME'... 4 authorities found
    Updated 1 biblios
    Deleting 155
    Updated 0 biblios
    Deleting 155
End of deduping for authtype 'MEETI_NAME'
Updated 1 biblios
Deleted 2 authorities
No biblios to update

After
~~~~~

kohadev-koha at kohadevbox:koha(bz13706)$ misc/maintenance/dedup_authorities.pl -v
-a MEETI_NAME -m date
RUNNING IN TEST MODE, NO CHANGES WILL BE MADE
Fetching authtypecodes...
Fetching authtypecodes done.
Deduping authtype 'MEETI_NAME' 
Fetching authorities for 'MEETI_NAME'... 3 authorities found
End of deduping for authtype 'MEETI_NAME'
Updated 0 biblios
Deleted 0 authorities
No biblios to update

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list