[Koha-bugs] [Bug 17661] Differences in field ending (whitespace, punctuation) cause duplicate facets

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Thu Aug 27 02:13:19 CEST 2020


https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=17661

David Cook <dcook at prosentient.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dcook at prosentient.com.au

--- Comment #16 from David Cook <dcook at prosentient.com.au> ---
This is interesting.

I'm not surprised that "_get_facets_data_from_record" works in an undesirable
way, but I'm surprised that this would be necessary for
"_get_facet_from_result_set" using Zebra facets, since Zebra returns the data
in a normalized format.

For example:

Z> elements zebra::facet::author:p
Z> show 1
Sent presentRequest (1+1).
Records: 1
Record type: XML
<record xmlns="http://www.indexdata.com/zebra/">
  <facet type="p" index="author">
    <term coccur="21" occur="11">berryman faye</term>
    <term coccur="11" occur="11">dale rae</term>
    <term coccur="11" occur="11">dale rae 1945</term>
    <term coccur="11" occur="11">fitzroy programs</term>
    <term coccur="12" occur="11">o carroll philip</term>
    <term coccur="10" occur="10">o carroll philip 1945</term>
    <term coccur="6" occur="6">powell jonathon</term>
    <term coccur="6" occur="6">powell jonathon illustrator</term>
    <term coccur="9" occur="6">reynolds kate e</term>
    <term coccur="12" occur="6">shapiro lawrence e</term>
    <term coccur="6" occur="5">carter emily</term>
    <term coccur="5" occur="3">caramagna joe</term>
    <term coccur="6" occur="3">schaefer charles e</term>
    <term coccur="4" occur="2">digeronimo theresa foy</term>
    <term coccur="4" occur="2">hughes edward e</term>
    <term coccur="4" occur="2">kazdin alan e</term>
    <term coccur="4" occur="2">reeve christine e</term>
    <term coccur="4" occur="2">renton n e nicholas edwin 1931</term>
    <term coccur="4" occur="2">snell martha e</term>
    <term coccur="4" occur="1">attwood tony</term>
  </facet>
</record>
nextResultSetPosition = 2
Elapsed: 0.030227

Now in this case there are some apparent "duplicates" where the author's date
is included in some records but not others. I would argue that is an authority
data issue though, since "John Smith", "John Smith 1945", and "John Smith 1995"
are all different authors. 

I don't know how Elasticsearch handles its facets, so I can't comment there.

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list