[Koha-bugs] [Bug 12478] Elasticsearch support for Koha

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Mon Aug 31 07:20:20 CEST 2015


http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=12478

--- Comment #91 from Robin Sheat <robin at catalyst.net.nz> ---
(In reply to Jonathan Druart from comment #81)
> Well, it's defined yes, but does not work at all (the marc21 mappings are
> used) :)
> It is caused by some errors in the sql file. Patch's coming.

Ah, ta.

> 
> Note the following:
> MariaDB [koha_es_unimarc]>  insert into search_field (name, type) select
> distinct mapping, type from elasticsearch_mapping;
> Query OK, 73 rows affected, 57 warnings (0.05 sec)
> Records: 73  Duplicates: 0  Warnings: 57
> 
> MariaDB [koha_es_unimarc]> show warnings;
> +---------+------+--------------------------------------------+
> | Level   | Code | Message                                    |
> +---------+------+--------------------------------------------+
> | Warning | 1265 | Data truncated for column 'type' at row 1  |

Hmm, I remember that, but I'm not 100% sure it mattered. Could be wrong though.

> Yes of course, but I am not a real tester, I am a developer, and it would be
> useful to share info on specific data.
> I am fine to use the sandbox DB, if it's ok for you.

Fair point. Let me see if I can tidy the database some for uploading somewhere.

Here it is:

http://elasticsearch.koha.catalystdemo.net.nz/files/koha_es_marc21.sql.bz2

it's not the best data, but it's good enough for messing about with.

> > > 2/ The number of tests provided is very low.
> > Yes, I've been meaning to go back and add a pile more.
> Ok, I let it that for you :)

Oh, you don't have to. I don't mind if you go and write them all for me :)

> Patch is coming.
> Patch is coming.
> Patch is coming.
> Patch is coming.

Thanks!

> 
> Yes it has:
> title":["Dollhouse"],["Seasons one & two."]]                                
> 
> 245$a Dollhouse
> 490$a Seasons one & two.
> 
> But 245$a should be used for sorting :)

Yes, that's something I'm trying to fix at the moment :)

> The item is a "Visual Materials" which has a itemtype.notforloan flag set.

Good to know, I've not tested that case yet.

> Outch, not sure how I could find that easily.

Probably easiest to construct a case manually.

> It comes from the 008
> > "Pictura murală*" has "pubdate":"||||" (/_search?q=_id:39&pretty)
> 008 090409|||||||||xx |||||||||||||| ||und||
> > The Korean Go Association's learn to play go  "pubdate":"uuuu"
> 008 971030muuuu9999nyua          000 0 eng 
> 
> But the index should not contain an invalid date.

Hmm. I don't know if we can put validation into the fixer rules. I'll have to
explore that some further. Possibly also telling ES that this must be a number
could cause bad data to get rejected, but it may reject the whole record, not
sure.

Do you happen to know how zebra handles that?

> For Solr (you can find the code on the BibLibre repo at
> https://git.biblibre.com/biblibre/koha_biblibre/commits/dev/solr Browse
> C4/Search/), we used a system of plugins. And there is a Date plugin
> (https://git.biblibre.com/biblibre/koha_biblibre/blob/
> bd38ce1811289fcfbd75a37ec99fc4cd3c5d37f4/C4/Search/Plugins/Date.pm) which
> does this job.
> A plugin can be linked to a mapping.

We probably can't directly reuse that, at present we're using Catmandu do do
the data conversion and interfacing with ES for the most part. But it's
possible I can hook something in somewhere.

> Just a note: I know nobody has ever had a look at the Solr code, but it is
> used in production by several (4 or 5) customers for more than 4 years now.
> And I have already had all the issues and problems you will encounter.

I'm sure I'll encounter some exciting new ones :)

> I will try and see if I can find some time and propose something here, I you
> want some help.

Sure, anything is welcome.

(In reply to Jonathan Druart from comment #90)
> Something else, there is a sort issue in the facets:
> 
> [Some entries]
>  Zeitoun, Ariel,
>  Ó Cadhain, Máirtín.
>  Ślez, Ts..
> 
> Ó should be after O, not after Z.

Line 573 of opac/opac-search.pl does a sort with cmp, which isn't very unicode
aware. I'm putting that in the not-my-problem bin as it's in upstream :)

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list