[Koha-bugs] [Bug 12478] Elasticsearch support for Koha

Wed Oct 7 08:03:35 CEST 2015

http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=12478

--- Comment #145 from Robin Sheat <robin at catalyst.net.nz> ---
(In reply to Jonathan Druart from comment #141)
> Did you have time to look into the terms aggregations?
> Or maybe is it what you are already using?

Yeah, I'm aware of that. Unfortunately, it only became deprecated after I'd
implemented it, and I haven't got around to going back and reworking it. I
don't think it'll be a big change, it's possible it's sufficient to just switch
the ES type we're using. Anyway, I've put it on the "deal with later" pile.

(In reply to Jonathan Druart from comment #142)
> On a unimarc installation, there are no mapping at all for authorities.
> If an unimarc user follows this bug report...

Working out the MARC21 mappings was tedious enough :)

(In reply to Jonathan Druart from comment #143)
> Maybe I have not drunk enough tea this morning but...
> I am trying to improve the mappings area to have a consistent interface to
> manage them.
> The idea is to 1) move the elastic_mapping.sql to a elastic_mapping.json
> file (easier to modify and read), 2) provide methods to
> serialize/unserialize mappings and then 3) introduce a backup/import/reset
> mappings feature and finally 4) make easier the mapping progression to get a
> good basis to use ES.
> 
> I have managed to create a json file from the sql file, the structure is
> something like:
> 
> {
>   biblio => {
>     title => {
>       label    => 'Title',
>       type     => 'string',
>       mappings => [
>         {
>           suggestible => 1,
>           facet       => 1,
>           marc21      => '245a',
>           unimarc     => '200a',
>           normarc     => '245a',
>         },
>       ]
>    },
> }
> 
> And I have some questions :)
> - Do you agree with the idea?

Well... I don't know. Though I'm not a fan of that structure really, as it's
not ideal, and is a bit more limited. Also, in this case you can't have more
than one title, but that's not really the big issue. Mostly it's just a very
denormalised view of the data. Better for manually editing an SQL file, but not
really so good for a computer to use. This is why the SQL file has the data in
that form and then normalises it in the database. 

> - Don't you think the index_name should be a column of the search_fields
> table?

Yes, it should be kept with search_field.name as it's effectively more
information needed to describe where something gets stored.

> - Some of the fields don't have a type, should we assign "string" as the
> default value?

I'd like to not just because that implies that they've consciously been made
strings. Ideally as time goes on, people will decide that this is a date, and
this is a ... IP address or something, and add those as types while putting the
logic in to handle it. So, if a type is unspecified, then it gets treated like
a string by default, but it really means "we haven't decided yet."

> - wordings: 'sortable' and 'facetable' sounds more appropriate than 'sort'
> and 'facet'

hmm. I don't really mind either way. My thinking was that "facet" and "sort"
were easier to type. But I broke the consistency because "suggest" seemed
weird. I don't object to any of them changing.

> - (/me is clearing his throat) I think that all the mappings of a field
> should be removed if the field is removed. In other word, there is a 1-n
> relationship beetwen search_field and search_marc_map, which means that the
> join table (search_marc_to_field) is not needed and we could simplify the
> structure removing it.

I had a good reason for doing many-to-many. Let me see if I can remember it...

Oh wait, I documented it:

-- This joins the two search tables together. We can have any combination:
-- one marc field could have many search fields (maybe you want one value
-- to go to 'author' and 'corporate-author) and many marc fields could go
-- to one search field (e.g. all the various author fields going into
-- 'author'.)

If you remove the many-to-many relationship then you end up with
duplication/denormalisation. My thinking behind the UI is that you might have,
say, a list of all the fields and under them, a set of all the MARC fields that
map to it. Or perhaps the inverse. I hadn't really thought about it too much,
but a properly denormalised relational structure means that we have the maximum
amount of flexibility. The only improvement to the structure in this respect is
that the sort, facet, suggest things should really be at the join level. I have
a feeling I considered that, then decided it risked crossing the line into too
fiddly, but it would get more power out of it. At the moment you'd have to
duplicate the MARC details if you want different values for those three which
isn't ideal.

(In reply to Jonathan Druart from comment #144)
> Ha, something else: the biblionumber should be a field of the biblios index.

Oh, it's embedded as the ID on the ES record. There's no point duplicating it
as its own field, but it's reasonable to copy it out as a post-process step and
put it into a biblionumber field. We don't have a reliable

-- 
You are receiving this mail because:
You are watching all bug changes.