[Koha-bugs] [Bug 12478] Elasticsearch support for Koha

bugzilla-daemon at bugs.koha-community.org bugzilla-daemon at bugs.koha-community.org
Fri Aug 28 13:29:22 CEST 2015


http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=12478

--- Comment #81 from Jonathan Druart <jonathan.druart at bugs.koha-community.org> ---
(In reply to Robin Sheat from comment #80)
> (In reply to Jonathan Druart from comment #79)
> > The first problem I got was to find a MARC21 DB (since the UNIMARC mappings
> > are not defined, I cannot test with an UNIMARC DB).
> 
> The UNIMARC mappings should be defined, though not tested.

Well, it's defined yes, but does not work at all (the marc21 mappings are used)
:)
It is caused by some errors in the sql file. Patch's coming.

Note the following:
MariaDB [koha_es_unimarc]>  insert into search_field (name, type) select
distinct mapping, type from elasticsearch_mapping;
Query OK, 73 rows affected, 57 warnings (0.05 sec)
Records: 73  Duplicates: 0  Warnings: 57

MariaDB [koha_es_unimarc]> show warnings;
+---------+------+--------------------------------------------+
| Level   | Code | Message                                    |
+---------+------+--------------------------------------------+
| Warning | 1265 | Data truncated for column 'type' at row 1  |

and 72 others.

> > I have used the one created for the sandboxes
> > (http://git.koha-community.org/gitweb/?p=contrib/global.git;a=blob;f=sandbox/
> > sql/sandbox1.sql.gz;h=19268bccb43b2a33d5644b7d86cbb1abb323016b;hb=HEAD). But
> > there are only 436 biblios, it's not enough to test some stuffs (facets for
> > instance).
> > Or maybe you can share your DB?
> 
> I could, but I think we'll get more useful results from different databases.

Yes of course, but I am not a real tester, I am a developer, and it would be
useful to share info on specific data.
I am fine to use the sandbox DB, if it's ok for you.

> > Here some notes:
> > 
> > 1/ Add deps to C4/Installer/PerlDependencies.pm
> 
> Yeah, I'm mostly waiting for things to settle (which they have now.)
> 
> > 2/ The number of tests provided is very low.
> 
> Yes, I've been meaning to go back and add a pile more.

Ok, I let it that for you :)

> > 6/ Verbose does not work as expected, it could be fixed with
> 
> Oops. TODOed.

Patch is coming.


> > 7/ perl -e "use
> > Pod::Checker;podchecker('misc/search_tools/rebuild_elastic_search.pl')";
> > *** WARNING: empty section in previous paragraph at line 36 in file
> > misc/search_tools/rebuild_elastic_search.pl
> > *** ERROR: =over on line 38 without closing =back at line EOF in file
> > misc/search_tools/rebuild_elastic_search.pl
> 
> TODOed.

Patch is coming.


> > 8/ 2 occurrences of "Solr" reintroduced in installer/data/mysql/sysprefs.sql
> > and koha-tmpl/intranet-tmpl/prog/en/modules/admin/preferences/admin.pref
> 
> Must have come about when merging. TODOed.

Patch is coming.

> > 9/ Test!

> > c. Search for 'harry', sort by title AZ (screenshot
> > opac_search_for_harry_sort_by_title.png)
> > - 'Show more' links is displayed even if only 2 entries for a facet are
> > available
> 
> Thought I'd fixed that, I'll have to have a look again.

Patch is coming.

> > - The order is still different ("The discovery of heaven" should be sorted
> > either before Dollhouse (if the is a stopword) either after "Hareios*"
> 
> Dollhouse probably has another title field that's actually being used, as
> noted above.

Yes it has:
title":["Dollhouse"],["Seasons one & two."]]                                    
245$a Dollhouse
490$a Seasons one & two.

But 245$a should be used for sorting :)

> > - The availability is wrong for ES (The item for Dollhouse is not for loan)
> 
> Why is it not for loan? Is it by policy, because there are no items, or
> because all items are issued?

The item is a "Visual Materials" which has a itemtype.notforloan flag set.

> > d. Search for Books (limit by item type in the adv search), sort by pubdate
> > (screenshot limit_by_book_sort_by_pubdate.png)
> > - "Return to the last advanced search" link is not displayed
> 
> I wonder how it knows to show that...
> 
> I can't actually find that string in my checkout at all.

Yes sorry, introduced by Bug 13307: Create a link to the last advanced search
in search result page (OPAC).
Which is not in your branch yet.

> > - The item types facet contains several entries, which does not make sense
> 
> Curious. Are there situations where you have a biblio-level itemtype that
> differs from the item-level item type, or where one biblio might have
> multiple items with different item types? At the moment, I think they're all
> being thrown into one facet pot.

It comes from biblioitems.itemtype=2WEEK
Not sure if the data I used are correct...

> > - The number of results highly differ (395 vs 364)
> 
> Probably due to biblio-vs-item itemtype selection not being supported yet.
> If you can find it giving you a record that plain shouldn't match though,
> that'd be interesting.

Outch, not sure how I could find that easily.

> > - The order is still completely different. I had a look in the index and
> > found:
> > "Pictura murală*" has "pubdate":"||||" (/_search?q=_id:39&pretty)
> > The Korean Go Association's learn to play go  "pubdate":"uuuu"
> > (/_search?q=_id:155&pretty)
> > Where do come from these values? Shouldn't be a date, or at least an integer?
> 
> Could be the mapping is funny/broken for that. My test system has things
> like:
> 
> "pubdate":"1998"
> 
> though, which implies that it's correct. The actual mapping comes from:
> 
> INSERT INTO `elasticsearch_mapping` (`indexname`, `mapping`, `facet`,
> `suggestible`, `type`, `marc21`, `unimarc`, `normarc`) VALUES
> ('biblios','pubdate',FALSE,FALSE,'','008_/7-10','100a_/9-12','008_/7-10');

It comes from the 008
> "Pictura murală*" has "pubdate":"||||" (/_search?q=_id:39&pretty)
008 090409|||||||||xx |||||||||||||| ||und||
> The Korean Go Association's learn to play go  "pubdate":"uuuu"
008 971030muuuu9999nyua          000 0 eng 

But the index should not contain an invalid date.

For Solr (you can find the code on the BibLibre repo at
https://git.biblibre.com/biblibre/koha_biblibre/commits/dev/solr Browse
C4/Search/), we used a system of plugins. And there is a Date plugin
(https://git.biblibre.com/biblibre/koha_biblibre/blob/bd38ce1811289fcfbd75a37ec99fc4cd3c5d37f4/C4/Search/Plugins/Date.pm)
which does this job.
A plugin can be linked to a mapping.

Just a note: I know nobody has ever had a look at the Solr code, but it is used
in production by several (4 or 5) customers for more than 4 years now.
And I have already had all the issues and problems you will encounter.

> > It's not easy to know what is indexed where. Did you have a look at the
> > indexes configuration page the Solr stuff had?
> > It provided an interface to configure the different mappings, it was very
> > useful.
> 
> I haven't yet got to the point where I have the time to make an interface.
> At the moment it's all configured in elasticsearch_mapping.sql, which is
> somewhat human readable/editable. After loading the data into a table, it
> rewrites all those tables into a form that'll be more conducive for having a
> GUI on top of, but is less human readable.
> 
> BTW, if you add
> 
> <trace_to>Stderr</trace_to>
> 
> to the <elasticsearch> block, it'll dump all the chatter with ES out to
> stderr, which is useful for seeing what exactly is going on. I warn you,
> there is a lot there though.

I will try and see if I can find some time and propose something here, I you
want some help.

-- 
You are receiving this mail because:
You are watching all bug changes.


More information about the Koha-bugs mailing list