[Koha-bugs] [Bug 12478] Elasticsearch support for Koha

Wed Oct 7 09:50:50 CEST 2015

http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=12478

--- Comment #147 from Jonathan Druart <jonathan.druart at bugs.koha-community.org> ---
(In reply to Robin Sheat from comment #145)
> (In reply to Jonathan Druart from comment #143)
> > Maybe I have not drunk enough tea this morning but...
> > I am trying to improve the mappings area to have a consistent interface to
> > manage them.
> > The idea is to 1) move the elastic_mapping.sql to a elastic_mapping.json
> > file (easier to modify and read), 2) provide methods to
> > serialize/unserialize mappings and then 3) introduce a backup/import/reset
> > mappings feature and finally 4) make easier the mapping progression to get a
> > good basis to use ES.
> > 
> > I have managed to create a json file from the sql file, the structure is
> > something like:
> > 
> > {
> >   biblio => {
> >     title => {
> >       label    => 'Title',
> >       type     => 'string',
> >       mappings => [
> >         {
> >           suggestible => 1,
> >           facet       => 1,
> >           marc21      => '245a',
> >           unimarc     => '200a',
> >           normarc     => '245a',
> >         },
> >       ]
> >    },
> > }
> > 
> > And I have some questions :)
> > - Do you agree with the idea?
> 
> Well... I don't know. Though I'm not a fan of that structure really, as it's
> not ideal, and is a bit more limited. Also, in this case you can't have more
> than one title, but that's not really the big issue. Mostly it's just a very
> denormalised view of the data. Better for manually editing an SQL file, but
> not really so good for a computer to use. This is why the SQL file has the
> data in that form and then normalises it in the database. 

I don't understand the problem with the structure, you could have several
mappings (it's an arrayref of hashrefs).
With this structure I could insert exactly the same data in the tables (except
if I missed something...).

> > - Some of the fields don't have a type, should we assign "string" as the
> > default value?
> 
> I'd like to not just because that implies that they've consciously been made
> strings. Ideally as time goes on, people will decide that this is a date,
> and this is a ... IP address or something, and add those as types while
> putting the logic in to handle it. So, if a type is unspecified, then it
> gets treated like a string by default, but it really means "we haven't
> decided yet."

So todo later :)

> > - wordings: 'sortable' and 'facetable' sounds more appropriate than 'sort'
> > and 'facet'
> 
> hmm. I don't really mind either way. My thinking was that "facet" and "sort"
> were easier to type. But I broke the consistency because "suggest" seemed
> weird. I don't object to any of them changing.

Not a big deal but better sooner than later.

> > - (/me is clearing his throat) I think that all the mappings of a field
> > should be removed if the field is removed. In other word, there is a 1-n
> > relationship beetwen search_field and search_marc_map, which means that the
> > join table (search_marc_to_field) is not needed and we could simplify the
> > structure removing it.
> 
> I had a good reason for doing many-to-many. Let me see if I can remember
> it...
> 
> Oh wait, I documented it:
> 
> -- This joins the two search tables together. We can have any combination:
> -- one marc field could have many search fields (maybe you want one value
> -- to go to 'author' and 'corporate-author) and many marc fields could go
> -- to one search field (e.g. all the various author fields going into
> -- 'author'.)
> 
> If you remove the many-to-many relationship then you end up with
> duplication/denormalisation. My thinking behind the UI is that you might
> have, say, a list of all the fields and under them, a set of all the MARC
> fields that map to it. Or perhaps the inverse. I hadn't really thought about
> it too much, but a properly denormalised relational structure means that we
> have the maximum amount of flexibility. The only improvement to the
> structure in this respect is that the sort, facet, suggest things should
> really be at the join level. I have a feeling I considered that, then
> decided it risked crossing the line into too fiddly, but it would get more
> power out of it. At the moment you'd have to duplicate the MARC details if
> you want different values for those three which isn't ideal.

Yes, it's related to the index_name unique key discussion we had last week.
We should either move sort, facet, suggest to the join table or remove it, but
not keep the current structure.

I am not sure about the gain of having the three tables, we could still know
what fields are mapped with this MARC field or the inverse (same marc_field
values).
Anyway, the current structure force us to duplicate the MARC details, because
of the sort, facet, suggest, which could differ.

> (In reply to Jonathan Druart from comment #144)
> > Ha, something else: the biblionumber should be a field of the biblios index.
> 
> Oh, it's embedded as the ID on the ES record. There's no point duplicating
> it as its own field, but it's reasonable to copy it out as a post-process
> step and put it into a biblionumber field. We don't have a reliable

Indeed we could add it later, but don't you think it's worth to let the
librarians (and devs) search something like "biblionumber:42", which is a more
familiar term than "ID"?

-- 
You are receiving this mail because:
You are watching all bug changes.